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ABSTRACT 

This paper examines ways in which hardware and 
software technologies can be used for effective educational 
assessment. The analysis considers current uses of computer 
technology in educational assessment and future applications. 
Computer systems can integrate administration of measurement 
instruments, presentation of instructional mater.'ala, recordkeeping, 
and management of instructional activities. Computers can provide a 
new kind of growth environment for classrooms and learning 
laboratories, and thexr introductior is a promising way to stimulate 
productive learning. Models that do not involve teachers, other 
educators, and students in slow growth will probably not work. It is 
best to start where users now are, and introduce formative evaluation 
as a fundamental aspect of implementation. Eight specific 
recommendations are given for increasing computer use in schools. 
Three recommendations dealing with the purpose-^ and methods of 
testing in the schools are: (1) greatly increase the frequency and 
variety of help services compared to high-stakes assessments, but 
balance the two; (2) greatly increase the frequency of formative 
evaluation, and provide funding and incentives to use the evaluation 
data for ongoing improvement of educational programs; and (3) 
increase the use of alternate methods of assessment (i.e., that 
require human judgment and that measure more complex, integrated, and 
strategic objectives.) Three recommendations dealing with the new 
infrastructure for Computerized Educational Assessment (CEA) are: (4) 
foster new item types and uses of portable answer media in order to 
utilize the current testing infrastructure more creatively; (5) 
encourage the development of a localized infrastructure of Integrated 
Learning and Assessment Systems, and the coordinated evolution of 
central sites for development of help systems and tests, and for 
research and development; and (6) encourage the professional 
development of teachers and other professionals who are knowledgeable 
and skilled about both the human judgment and the technical iispects 
of CEA, and are skilled at integrating assessment with instruction. 
Recommendations dealing with policy are: (7) federal and state pol:.cy 



should both provide research and development funds and stimulate 
private sector investment in improving technology-jaased assessment 
practices; and (8) high professional testing standards must be 
maintained and must continue to evolve for CEA systems. Eight table 
present study data, and two illustrative figures are included. 
(SLD) 
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mVESUlOCAB? 

This paper was commiasioned by the OfiBce of Technology Assessment (OTA) of the U.S. Congress 
in AiMust 1990 as a part of a mf^or review of educational assessment Of several commiasioned prqjects, 
this one deate with comimter technologies and their application in education 

One problem motivating the OTA statement of work is that computer advances h«^ 

led to ftmdamentaHy new paradigms and approaches to measurement of human cognitive ^^^^^ 
ability, or achievement The research question addressed l^y this paper is how can hardwe and 
noStman tg^hw>i*y^ murahallfld for effective educational asswimient? Can such computer applications 
reduce soaie of the testing problems inherent in current methods of eiucationa! assessment? Current 

testing methods have been critksixed in maiv ways, induding the foihae to assess 

problem-solving processes. These processes are vital to a dtixenry who must compete in an increasmgly 

wmptextechnotogicalenvironment,andinanintense|ycompetitivewoTld. Furthermore, as a sen^ and 

conMsman who encouraged this s^ 
conSSetendes rather than complex thinly 

In our schools? . « , , 

The Statement of Work required that thiq'general problem area be approached through a two-fold 
analysis: (1) Current Uses: How is computer technology current^ being used for the assessment of the 
various objectives of assessment? (Consider the benems and limitatiflns of each of the technotopes 
eurrentb befaig used.) (2) Future Uses: Looking ahead, what emerging software and hardware 
technologies any have fanplicatioos for educa limud a iw f s sm ^ nt , both through extending existing methods 
of assessment and through generating complete^jr new ones? . „ ^ rrr 

The two most extensive of the four sections of this paper, sections H and m, meet these 
requirements of the statement of work for ana^ of current practices and fiiture possibilities. In 
addition, Section I sets the stage and Section IV presents a summary, conduaions and recommendations. 

FINDmGS: THE ACTUAL AND THE POTENTIAL 

Educational measurements may be admmistered using eiUier portable answer media or interactive 
testing stations. Portable answer media include answer sheets, readable on optical scanning equipment 
or portable keypads or barcode readers which Cacilitate the entry ofa letter or a num^ Current practice 
is dominated by the use of printed tests with scaunable answer sheets. 

CompateriaedAdminktrBtioo: When computers are used to administer a test, we kise portability 
and must install computeriiedworkstationa or spedal simulation devices in a learning center or assessment 
center. Portable computers, faichiding notebook computers, may be used in the future for adminisfa^ 
of educational measures at temp(ffary locations. 

Computers toay be used in aqy of several processes during the life cycle of an educational 
measurement instrument Th^ may be uaed to aid in: 

1. d ftff ig p and devetopment of measurement instruments; 

2. distribution of measurement instiuments to testing'locations; 

3. administration of the measurement instruments; and, 

4. anaitysis and record-keeping after admmistration. 

Computers can improve and even tiransform any of these processes, but the emphanis in this paper is on 
the processes of administraticm. When the measurements are administered by computer, it is likely that 
conqmters are used extenshrel^y in the development, distribution, and later analyses as welL 

A model for technology diSiiaion used 1^ OTA distinguishes three levels of penetration of a new 
technology. 

1) Substitutive 

2) Incremental 

3) Transformational 

These three levels provide a uaefiil framework to report the findings. Computer administration may be 
used as a substitute for conventional test administration, which uses scannable answer sheets, printed test 
booklets, and oocasionaQy, a4iuDct audio and visual media or objects which the test-taker manip u l a t es. 

Testhig has not alwqrs been domhiated by printed, grotqhadministered formats made tqp of a 
goodfy number of short items. Albert Binet's pioneering intelligence test developed early in this century 
was individual^ administered. It provided the test-taker with a variety of standardized tasks that could 
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be answered vocally, or through a plqwicalperf^^ Many useM dinical tests today are indiwid^ 
administered, humaihjudged tasks. The Army Alpha, an intenigence test used ween recruits, 
represents the first widespreiid use of group administered paper and pencil 
cfadceitems. Sulshte8tsarewhoI^yo^^ectlve- that is, th^ do not require assessment in the jud^ 
responsss to individual items. A distinctiion is made between assessment and measurement fai this paper. 
Assessment requires human judgment to interpret observed refcpooses and assign scores, a^ 
dedaiaos based on the scores and other informatioa Qy contrast with this costly procedure, group- 
ttdministered paper and peudl tests provided simple scoring rules. The number ofcorrect items is typicaQy 
summed and a simple scoring formula is applied to correct for guessing. 

Subsfitiitive Usee of Goaqmten 

It is perhaps not surprising that the current uses of computers in educational measurement are 
primarily substitutive for paper and pencQ item tests, rather than individuaUy administered tests, since 
the paper and pendl tests represent the dominant format for educational measurenicnt Substitutive 
appUcatioos of computers take my the answer sheet and test booklet and present the items on an 
electron^ computer dispk^, receiving the responses firom a keyboard or pointing device. These 
substitutive tests use the same kind of scoring rules used in the paper versions. 

The use of such tests brir^pmaiy benefits, iachiding greater efficient 
a ckMer link of achievement tests to instructionsl modules, both in time and in content Even wlien 
separated from the instruction as separate pre- and post-tests made up of verbal items, the integration 
with instruction is much tighter. When delivered by the same interactive computer qmtem used to provide 
instruction, hints and helps following testing can be given at the moment of interest and of need. 

Weakneaes of Gomntkiud Tests. Unfortunate^, the whoUly substitutive use of computers cannot 
transcend tiie measurement Ifanitations of Uie original paper and pencil tests, nor will it eliminate the 
ne^tive consequences if computerised tests are stiU administered as separate and overshre activities 
integrated with faistruction. Item tests have been criticized for measuring shaUow factual verbd knowledge 
and being less effective at measuring integration, organization, synthesis, problem solving, design, 
creativity, motivation, persistence, &m. More damaging; when such tests are imposed as measures of 
minimum competencies in reading and math and schools are held accountable for bringhsg up scores on 
item testa, they become the focus of instruction. Boring drills to prepare for the tests may monopolize the 
time that would be given to bherentfy more interesting sul^ects and to thinking, problem solving, 
producing; etc 

fi ii.pw.wM. wfa J Tiin» oiiimaiiia Orer ft ilistlliitiv e Computer Teats 

Several jnwrementffl inqvovements over coo^mterized conventional tests are currentity in use. The 
two most wide^ used systems are called conqmter adaptsm and computerized mastery tes^ Unlike the 
computerized conventional tests that give a fixed number of items in a fixed order (like the paper tests 
they substitute for) the incremental improvements change the sequence and the number of items 
dynamically. These tests also use conventional items, but m an adaptive test, ee'^h new item is selected 
depending upon how the student is doing; with more able students getting more di£Bcult items and leos 
able students getting easier items. Computerized masteiy tests administer a series of item dustr^rs, 
stopping when a decision has been reached about whether the test taker is above or below some pre- 
established cut score. Both take for leas time for more accurate measurement. 

Other incremental inq>rovements aim to derive more information of a diagnostic nature out of the 
student's responses to the items. These eystems combine cognitive analysis with measurement in a 
promising way, but are iagjify oqNffimentaL 

Improved ddsple^ and response piocessing offer many incremental advances. Computer graphics, 
color, ■«i^««»iftn, video and audio add mterest and realism to the item types possible in computerized 
measurement. 

Tranafonnatkoal Uaes 

As the OTA statement of work suggests, there are treinsformational possibilities inherent in the 
new interactive computer technologies and mulU-media display amiabilities. Not oriy can items current)^ 
transcend the limited verbal reprraentation on the printed page throu£^ multi-sensory presentation, but 
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the definftko of an heit as a ahort, quickfy fidin^ 

♦ii«if of longer duration whi:h require significantly more integrated procesaing. These complex and 
integrative can be presented as siinnlwtfawui or games, or as tod-like laboratoiy environments that 
require eipteatoiy, constructive, and Ifl^eslstestmgacthdtieso^ SuchhigWy 
integrated and wOoeeptualfy demanding taaka, when used hi assessment, can trufy be labelled as 
transformatioaa;. Unfortunately, such tasks are not yet hannng much unpact hi educationai assessment 
Thflyareskm^tKinhighitouaehiexperhnentalfaistructiaoalsettmgB. They can be scored hoUstical^ by 
human judgea, but the pBychooietrka for scorfaig computeis»II^^ 

their rdiaWBty and validity of such scores has iiot been developed. The faifraatructure for developing, 
distributing, and interpreting them is not in place. 

Iq another potentiality transformational use of conqmter technolcgy in assessment, students are 
provided with computer toola for designing; devdophig and produdng "exhibits": written docu ments, 
presentation materials, performance scripts or plans, or other products of mind. Soaware productivity 
tools east for multi-media design, editing and production of student exhibits. These can be used m 
^KinAriftfi with project *i*rfgnin«intii and with portfolb assessment methodotogies to get at leammg 
otjecthfes that deal with creative productkm. In portfoUo assessment methods, students use portfoUos to 
manage their own set of exhibits, kxdudhig the intermediate stages hi the devekipment of some of the 
es^hihits. Such uses oftechnok)gy can be tn^transfonnationalm both mstructkm and assessment because 
thoy provide a oeuetratfaig method to assesiiusEESft as well as product Ushig hoHstie assessmen t based 
on human judgment the students and teachers can discuas btermediate products and can conriderthe 
iwues of strategy and tactkaleadtaig to the prcduetioa of a polished final product An exciting prospect 
for the fiiture is that student responses can be recorded while they are usmg computer design and 
produetkn tools. Future faiteUigsnt software can be used to provide hmts and helps to knprove strategy 

and tactks at the moment of need. 

Another transformatkoal possibility is that the use of productivity software tools may be 
augmented through access to large data bases of hn8ges,audk> and text Software for searchmg such data 
bases can be used hi the produetkm of student exhibits. These potential transformational uses of 
computers are gahmig momentum m the schools as a part of mstructkm m writbg and other 8^ 

have not yet been formalized as a new asse s sm en t method. 

TkamtoiHtka of Whole Slystenn. PotentkJty the most powerfid of transformatbnal possibilities 
inherent \a the idea d[ computerized educational assessment lies hi transforming the relatkmahip of 
awwHiffn^nt to faurtniction t« wmfertiriifiMl mflfMptmiwit. Educational measuronent has been viewed 
as an acthrity separate from mstructkm. It precedes an educational sequence or foOows it, but is left up 
to the teachers to prepare and use their own measures during an educa t i on al sequence. The patterns 
teachers nu^ kwk to in making tq) then* own measurement instruments emphasue judging and grading 
rather than helpaig and gddhig. In its most visible embodunents, contrary prepared other than teacher- 
prepared, Educatkmal Measurement is viewed with fear and dislike, and is not seen as mtegral to the 
processes learning and teachmg. 

Printed materials that mtegrate assessment with uastruction and can aid hi mstructional 
management are possible, and some have been developed, but conq)uter-ba8ed systems that integrate 
instruction, educational measurement and management offer a migor transformational step beyond these. 
Such systems can transform the roles of educators and theur effectiveness. Their roles as dedskm-makers 
can be enhanced fay siqpp^ying them with contmuous and up-to-date mformatum to guide their 
kitopretations and decisions. Students can become active problem-sohrers, strategists, and producers 
rather than passive recorders and regurgitators. 

Hd^ Slystena Veram Hif^b^takea Testa A conqmterized assessment system that is fuQy 
integrated with instructkm and is never used for grading vet hifi^-sta'i£s aocountab^ty is called a "he^ 
system." It bdkiUnguished at some length from a hi|^-8takes test in Section L If it is true that the natkm 
needs to use more school tune for progress and growth, and less for gradmg, judfi^ and measuring 
fifilnfantmm, then help systems which hitegrate assessment and instru iion (and are never used for grading 
either the students or the teachers) offer a promisbg alternative. 

Despite slow progress tor/ard transformational and even uicremental forms of computerized 
assessment, the message of this report is positive and hopeful - that Ck)mputerized Educational 



ERIC i> 



CcmpukmiKdEdueationaiAtam^ ExecutiiM Sumnuey — . Jk«*«» 

Aflsessment (CEA) offek-a great potentiiil for the aeseasmeiit of individttd teamen^ for the cvahiation of 
educational programa, and for the transformation cf professional roles and practices of educators. 



EdncaAiooalJ 

Tha key research question stated in the statement of work is: "How can facreasingfer powerfiil 
hardware andmore sophistkartedsoftware be marahalledfor efifecthm 

thisquestioo,thenatureof theeducatkmalmeasurementinfrastructuremustbe TT*^'^ 
of ttSa infrastructure are diseuised in the paper: the tec h nologinal base and the human talent The 
infrastructure of the current system is oentraHxed. Large computer frdHties with pap« procesa^ 
equipment are at the center. These fiwiUUes are managed by testing compmies, state eAjcatwn 
andfaMme cases, professkmal organisations who test for Thereisnhuman 
infrastructure of part-time test administrators for U»ge national testing jrganii^^ 
cdlUwe teachers and secondary school teachers. Thqr use borrowed or leased finalities for afe^ 
ttdSistertiie tests under controlled conditions. Unfortunate^y, tkiere is no permanent physical tocation 
for computerized assessment (even assessment intimately integrated witii instruction) and if j^^ 
talent baae to administer tiie assessments at tiie pohit when it couU benem tiie learn^ a^ 
prooeases of school chiklren. B!y w^ of contrast, colleges and many high sdiools have admissions officw 
cmd guidance offices whkh interpret educational measurements to make w d m i sH i mM decisions and 
placement dedsbns, and to provide guidance and counselling. , , . ,u ^ 

Since the current infrastnvciure for educational measurement is tireaOy in place, the question 
arises as to whetijer it can be revitaliisd with a host of new item types and new methods tiiat use prm^ 
and scanned answer sheets more crentive^jr. This pap^ reports on exciting new printed ftem tj^w tha^ 
can i^vivify and hnprovetiM current testing infrastructure. Students can mark out words in running text 
identify oWecta or structures by marking on printed pkrtures, can draw arrows, Knea or su^ 
can e';«n print numbers and'itterstiiat can be recognised lytiie optical scanners and associated solb^ 
A new :-^frastrueture is needed to the schools, consisting of networked computer workstations, m 
order to obtain tiie transformational benefltaofcomputer uses m assessment Thiseqiiipmentisnecessaiy 
jjo faitrodapp CTmpiit*^ ****"^^' ****'^ atandardiied performance tasks. inHiiding simu lations a n d g w mffw , and 
process measures during tool use in the production of student exhibits. In addition to capitalizing and 
fautalltng tiie technotogy base, a new human telent base must be dcvetoped. Such a new lonl 
infrastructure is being instalied in tiie schools even now. There are two kinds of systems bemgmstalled: 
Ihte«atedLeanik«3y8tema(ILS) and networked computer laba for tool use. The dommant mode now 
cvoWng uses a rela**^ powerfU file server computer witii large centiral disk file 
and for record-keeping. These aystems may be used for computer-aided instruction; tbey may also be used 
as labs for productivity tool use. The integrated learmng qnitems come closest to the environment 
foreseen in this paper as prov^iding an hifiastructure for the dettveiy of computerized educational 
assessments faitegrated witii instruction. The recommended system is referred to in tiiis paper as an 
Integraled Learning and Ataeumat^ytlanaLAS). This is a concept not a particular product ^^^^ 
evohitionary improvement over Uie Integrated Learning Systems (ILS) and the computer labs Uiat now 
constitute an important growth industry in ed u c at i on . 

By integrating wwwf""*"*, instruction, and records management the ILAS ofiTers the most 
profound transformational opportunity of the computer. It provides an environment where sensithre and 
hnportant assessment, hi the fiiU sense of the term defined herein, can take (dace and to association with 
good management and instructkmal decision-making. These decisions are also enhanced by the properties 
and features of the ILAS. Skilled teacher/managers, are also good assessors and dedsion-makers, can 
devdop professional^ over time to a context of teachtog with the use of an ILAS. This provides a 
ohiiiutigitig and promising professknal growth path for teadierr toward new roles more co n so na n t with 
national needs. In achieving Uiis professtonal growth, teachers working with Integrated Learntog and 
Assessment 3y8tems will be greatfy strengtiiened to theb efforts lead students to high standards of 
achievement to the complex and demanding objectives needed for success to a technologically mtensive and 
toiemational^y con^Mtitiva vrorld. 
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The r wrommmd a t i o oB developed aa a result of the analysis of current trends and future 
possabifities pinvide a vision and a direction Specific detaib wiU need to be worked out state by state, 
district by district, school by school, and company by company within the rapidfy growing RS industiy. 
This industiy, wfaidi has an annual yohmie of over $8()0 miOioa, is n^ 

toward the integration of measurwnffnt and assessmont with instruction gLAS). Because of the generality 
of the recommendatiaos, thay have imp l i R Wt io n s for research, development, and iropleinentatian support 
There are substantial policy imp l i ea t ions for these reconunendations as well, for they lead toward 
transfonnatior ilinnaivatiaas. In particular, the entire Integrated Learning and Anessment System (more 
propel^. Integrated T i ^ a m i n g; A ss es sme nt and Formative Evahiation System) that is envisioned in this 
paper is transformatianal of the ro^ » and Mtivities of educators, both administriitora 
ofstudeots. There are power shifts as students and teachers are given more ccotrol over the information 
they need to niake dedflians, and more contrd over the resources and policies that are connects 
decisions they win need to malu to optfanise and manage the learning enteriwise. Principals and other 
adndnistrators must become instnictioaal leaders; interpreters of dedsion-orientr d data gathered from the 
high-stakes wssMsm e nt i and formative evahiation data. 

The recn n t mim da t ioBa are divided into two maior groupa. am? there me ihrtm wwnmmiiHiiHftnii 
ineachgroup. Tlie first set deals vrith the purposes and methods of testing h the schools, and the secood 
set decila with buildfaig the needed infrastrud»ve for computerised adniinistration 
measurements. 

IhreeBeoonmiendatkM Dealing With the Ihupoata and Methods rf 

Rem i H i iMii M lB lhn One; Greasy fpirrim the frequency and variety of hd^ servfces conqwred to 
Mgibstakea awsfa s m i ii li i , but bahnee the two. 

Be rnmmmidBt ionTwoe Greatly iMv^^tlMlWiqiiiww y «ffe«^«.Aiyft^ ^ 
and incentitea to use the evahiation data to opgoing im|TOf«^^ 

llmt'imHmmwAtiim, t«n— ■„■ f ^u^-^,^^ mrthnitn nf iMwrmi iM in l Q ii tliiil iLijuiiL 

human judtpnent and that meanire more complex, integrated, and strategic oljedivea.) 

This recommendation is explained in terma of a conceptual framework consisting of tovx 
measurement methods and four kinds of ot^jectives. Th« following table summarises Recommendation 
Three, which calls for research, development, and ?-i;?!ementat<on that emphasizes the measurement 
methods ^performance tasks, exhibits, and process nieaaures during tool use, and thua more teaching 
and learning of the higher-order constructs of integratitm, creative production and strategies. Theuseof 
ftOT tests for acafiTolding knowledge should be improved and extended beyond scafibkling knowledge 
(verbal knowledge afefiSlt "ome topic - terminology, rlefinitions, classifications, simple role use) as far as 
poasible. 

BeoooDimendatioa 8: Measurement Iftathods Suitable for Diffmnt Olgectives 
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The greatw the munber of aot«»riflka in a cell, the more Bppwp^iBtB a pertkular meemireiiieiit 
method ia for the ckuM c^otgective listed to the left Thiu^ item testa are the hmt suited for meamuing 
acaffoldiiigdfcjectivee. Ihgenc^ perfosinaim 

objectives and process measures) the greatest potential for measuring strategies. 

Thr*^ H^mniiiiii^iiiMtlftM IWlfay Hrkh thm Vmm lnSkmiAmi4Mrm nmnpi^Mm tmmA TMiiwitingMil Ammmganmnt 
(GBA) 

Bseoomiflndalioa Four Foster new item ^ypes md usea of portable anawesr meda in ordor to 
tlie current testing infinsstractus^ more cnative|f* 

Portable answer media (mainlt)^ answer sheets) should be fireed firom domination by the mutt^>le 
choice item type. It is now possible to develop and introduce maqy new item types and task typee for 
pe^ ddiveiy in h^ flystems as practice and feedback wori^^ 

outcomes. In short* use the infrastructure that is in place to broaden the assessment options available* 
Bfcommendfttions Five and Six de^A with buil<Mng the new technological and himian infrRatmcture 
for ""omputer-administered assessment 

Beeommsndstkn Fife: Encoiar^ge the detetopcaent of ci IfyaKwd iiiftaaUiM rt u re of ^but^pated 
L e a inlu g snd Asssameni S^^atems^ and the awgd i na t ad etohitfca of central mtm for dewlinaiien t of ti^ 
qratema and leat^ and for BftD. 

Wfhiam i rinnikliiBi flir Enrmimgn thfit [■^ffsraniial drrrlfirmnn \ vf tmrhriTir rmil irthnr p ^^ ^ma nn al n 
wiio are knofwlsdgeahle and sMBsd about both tlie human 
are skilled at jnteyaliug assesMsent with jnetructioD. 

BeoommendaiiaDa Deafiqg wisih Po&y 

Becnmmwiffatinn Seven: Federal aod stirte poBcy sfaouM both provide BAD ftmda and rtnmiiittp 
private sector iuirjitmmt in i mp rof lu g technotogf'based aaseasment practjcea, 

B w'i a i i i ii iii M lB tiopKi^Hji^iprofeBwopaltesti i^B t ^ 
to evolve for CSBA lystenis. 
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SECTION I: FBAMING TBE ISSUlfiS 



BACKGROUND; ISSUES mMEAaUI^^ 

How ifl meafluremant viewed and ueed in Education vs. other oocuFatiopfl? What are the 
differences between measurement and assessment, and the shifting sodal roles of each? 

Q mtmAi iig Views Toward Mfws i ffympnt 

We scarce^ notice how measurement standards for time, quantify, distance, velocity, and money 
structure our daily ^^m^^F^^» and commerce, guiding our dedsiims* For scientists and engineers, 
meacuremcmt is an inseparable aify, used to caUlvate and test equifmient, quantify obeorvations, validate 
l^^g^poiheaes, theories and designs* For skilled operators of cooqdes qrsy^ 

approp ria te array of instrument readings indispensable to Unify dedsioQ making* Business dedskn 
makers measure the performance their business unit, suz^^ expense, and 

productivity. The quality of their strategic decisions depends on the oocuracy and conq)leteness of their 
data. For pn^Msicmal artists and athletes it is the feedback o£ coeduss during practice, and sccrecard 
statistics that are indispensable to im^tmng performance. In these occtqpations, however, measurement 
is as fiBmiUar and unobtrushre as are the common standards for the by public. It is a totally integrated 
aspect of woric, eagsrfy souglit, ahvagv essential to sound dedsioQ mak^ 

In contrast, educat'onal measurement is less integrated and more intrusive* It has become a 
coerdve tod oftheediicatjonalaystem to assure comi^iance in tte 

an embarrassing index of intelligence or iDiteracy, a feared gatekeeper of opportunity, a weap<n used 
administrators to indkate accountability and academic productivify. Teadiers and administrators may 
resist the introduction of measurement, but in this they are not so different &om people in ether 
occupations. Few raqdoyees or professionals are enthusiastic about measurement when th«y are being 
evaluated in bureaucrats settings by perSims eademal to their work groiqi. When measurement is 
infxusbe, hin^ stakes and threatening; it is not welcome. 

£b there a kind of measuremeri system in education that wcuLl evd^e less resistance, dislike and 
fear than the current system? Could measurement become a more integrated and he^iul aspect of the 
woric of administrators^ teachers and students? Certainfytheremustbeaway,anditwiUhavetoinvohe 
a move toward more helping and less judging. 

America needs a highfy jn'ofessional educational workforce. One indisputable mark of 
professioQaliam in many occupatkms is the al^ty to inC^ret measures essential to their work vrith 
sensitivitjr, bal ance, and good judgment Tim^ and appropriate information is as important for educ^ 
asfwothmtosiqqx>rt anddarifytheirde^ The feedback provided by he^ftil measurement is not 
onfy vital for immffdiat^e decision making; but also f<ur improving fiiture decisions based on a framework 
that enables one eq)erience to be compared with another. Without dear indicators of desirad educational 
ol^jectives, educators orient their prioritiei toward unspoken and tmwritten priorities, induding those 
unrelated cr even inimipaWe toward student learning. 

New developmente and tedmdogies are vitaHy needed in Educational Assessment The invisible 
intellectual acquirements of students are very hard to measure, and it is even harder to do so in a time^ 

and iqp-tO-the-minUte way. Thg pm-ftirmminii^ unri proHiiHiivily of ^tu^Hnr^p} «yn»^m r mifi^^ 

classroom, a locd school, a district or a state, is also hard ^ 

instruments whk:h now predominate in measurement practice. Educational measurement experts devise 
scales to make visible many subtle nuances mherent in learning; problem sohnng; and thinking, Jn the 
hard sciences, the more deepfy embedded and invisible is an hypothesized construct the more costfy it is 
to measure, requiring complex instrumentatkm. Sfight it be possible to equip the ckssrooms of this nat^ 
with computerized assessment instrumentation that wiU enable measurement to beoune the iMeparaMe 
and unobtrusive aUty of educatkm? The answer this paper gives is "yee, but./. Yes, there are great 
opportunities inherent in current and future uses of conq>uterized measurement instruments. But 
s ig nifican t R&D is needed, a new infrastructiu^ must be put in place, and thou^^tfUl policy development 
must guide both. 
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A THnHfirtiri brtwrm Mrnmirmrnt iiml i^iim iiiiiii nl^ 

TaUng measurements is not 'aie same as performiug an assessment An assessment requires 
interpretations and human jud(pnenta, while a measurement alone does not Consider an example from 
medical practiu: A pedia^kdan nugr measure veiy predae^ the height and weight of a child, but to 
intSFpret those mMsurements and take acti<m requires an act of judgement an aaaesament There is no 
unambiguous diagnooia of overweight or underweight based on the measurements akme. Other Meters 
such as age, sex, genetic factors, and health history may be even more inqxNrtant than the measures for 
a given diagnosis and prescrqjtion- Espedalty vahied is subtle clinical judgment based on years of 
esperience in interpreting the appearance, demeanor, odor and other subtle cues. 

A particular a s s e ssmen t can be ^wM or invalid. Consider the sequence ofacthrities that follow the 
measurement of some set of human attributes. 

me aa urea — > inteiprielaiioiM — > dsdsiaEyi '-> oonsequeooea 

The act of measurement is neither valid nor invalid (it may be q>propriate or not, or accurate or. 
not), but the interpretation that follows the mK«Burementni4y be vaM or inm^ These judgments should 
employ uMre info rmat io n t han is available fromasingte measurement, e 

high stakes to some person or peraoQs. ShnOarlly, the dedskm is ajudgment based on the interpretation 
and other information, and on the options available. A decision has consequences, both negative and 
posithre, which must be considered by the decision maker. Good tests provide statistical evidence that the 
measure win predfct ftiture outcomes. Knowing this, the decision maker can be more confident that 
certain positive outcomes are more like]y than negathre ones. Good tests also provide published evidence 
that a decision will at least be as &ir and equitable as possible when negatwe c ^nsequences for some group 
of test-takers foOow. 

^Messors and evahiators cannjt escape the responsibility for maldng valid asseeuments based on 
an the evidence avaflaUe. Fur assr^ssment to be used in a more powerM manner m schools, far more 
trabmg and sophistication win b«> needed, especial^ when the assessments are high-stakes. Evaluators 
should not be permitted (or coerced by external threats) to shrug their shoulders and "let the 
measurement make the dedsion." 

Bvahiating Axnerica'a iMt^rmts^] ^ee^ 

Amerin's poBpy makers have read a variety of measures and hxdicators and know that we are "a 
nation at riuk."^ Consider a hypothetical national scale of educational achievement At the bottom of the 
scale is a large number of educational drop-outs and Mures. A disproportionate number are poor, or from 
ethnic and cultural ndnority. The middle range of our hypothetical scale also presents a grim picture. 
Those who suocessfiillijr entar the workforce may not be sufficient^ skOled to enable our econon^ to 
participate efifecthraty hi an hicreasmg|7 competithre global market At the top of the educational 
achievement scale the situation is positive - top coneges and unhrersities hi the United States are "world 
dass". The fact that fajcreasfaig numbers of fiKaiRy and graduates are foreign-bor^ 
hidicatim that America's candidates for higher education are competithre, but it is also an mdication that 
our universities are sought out by the best mmdi in an nations. 



^^•^•'^•,^^»°l^^o'*»iM»»»»rmn^ and Jffducafclonal Amaeamnt are dafined 
?2iI.i2fJf 1,7- ^" S«J»lic»*=ion«- In aducatlonal publications, th« t«nn 

''•J^?^.**^ ««P»««i» in thif papar is on nuking valid 

iJ??5?f**'?*'^°?* *"w** ?«ci«ion8 for action from Maauramint. Th. concept o£ 
W is i«^«>nP"t*tiona and the conaequencea of action, not 

L JlS /S*^^*'?h''' J?* ^' J- MaaaiSk, -Validity", in R. 

ff^'Mg^ttPpal Mea«uyaiwnt?# Third ed., (Mew York: Macmillan, 1089), 
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Moving from the bypothetical to the actual, there is recurring and growing criticism about the 
Ktandardiied edimtionnl tests we use to measure achievement. These criticisms include charges that the 
tests do not address higher order thinking skills, adaptabilily to new circumstances, or creathrity. Instead, 
they are seen to measure disconnected snippets of knowledge that are soon forgotten and are not 
integrated into a knowledge structure that can be used, along with por^mM thinking an^ 
to deal with the complex technology and new learning demanded of world class workers and sitizens^. 

BiiB an Historical Shift Occurred in the National Needs Serred I7 Educa^ 

In the past, success in school was not neosssaiy for entry into the job market As a result, not all 
children were eq;>ected to succeed in sduML Testhigpracticereinforcedtheprimaiy goal of education - 
to select and sort out a amaU percentage of sttidents qualifwd for managerial and profe 
from the m^oriiy of students ^Ndio woukl ultimate^ be i^bsorbed into a low-skill workforce. In order to 
achieve this goal, ediicHtiftnnl testa that sorted and ranked people gained widespread potnilari^. The most 
coet-effective technology for this purpoae ~ wrannnhle answer sheets and muhqile choice test booklets - 
war implemented nationwide in schools and colleges, and has become the dominant commercial^ developed 
educational measurement tool in use at this tfane. 

Today, the educatkm and assessment pwi-^Higw^* are pimng itig iu coiquncticm with dramatic 
worldwide power shifts that will impact every aspect of society; particular^ the way we educate our 
citixenry. As unskilled and semiskilled jobe conthiue to decrease, sorting and selecting lose inoqportance. 
A Ughw goal is to provide Cor each student instructional help so powerful that anyone with adequate 
motivation can succeed. Tjiding a way to offer such help to all children is necessaiy in order to meet 
Presideut Bush's goal for the year 2000 of providing an education sufficient to meet mtemational 
workforce d em ands. Tedmology may be the onfy feasible means to provide both instruction and 
ass e ss ment at the level of intensity required.* 

Our perception of standardized testing must be reconsidered in a new assessment context - one 
that recognizee the vahie o£ thhiking, creativity, deddon-makinft teamwork and the technological and 
mterpersonal skUs required to succeed. Conventional tests have merit, but it is what we are measuring 
and Iffis that measurement is integrated with the professional practice of education that must change. 
The sohitkm must integrate assessment with Ljtruction and learnhig. It must reflect dear national goals, 
both for strident achievement and for teaching practice. It requires sensitivity to the economic needs of 
states, and should celebrate the uniqueness of the individual 



FDBPOSES AND OBJEOHVES OF EDUCATIONAL MEASUBEMENT AND ASSESSMENT. 

A kqr requirement of the statement of work is to review computer iqpplications to "the various 
otjecthres of assessment" A chapter hy MOhnan and Greene ib the third edition of Tiducational 
Measurement" provides a good review of purpoaes and otgectives for testmg. As a part of a fimdamental 
description of purposes ibr testmg these authors distinguish between educational measurements taken 




no'. MWft1:lon and U.S. Coropetitiveneasi The com«uHii:v 

MiMt. (Austin, TXi IC2 Institute, The University of Texas at Austin/ October, 
1989). Roe diacusses the changes needed in the education continuum, from K-12 
and on Into the college and worlqplace to produce a world-claaa workforce. Beyond 
the viorlfiforce, vm need world-claae citizens with knowledge and wisdom to vote and 
uphold the higher accomplishnents of our clvlllration, and with the leisure, 
means, and dee ire to pursue service and culture. 

Hi. C. Noirrla, -The Future of the Information Bra.", in R. K. Heldman, (Ed.) 

^ '^ BEST COPY AVAILABLE 
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BEFORE, AFTER or DUBING an educational 8equence^ In this paper, the concept of edueatvmal 
aeguence maybe viewed aa several years of studt^ elementaiy, middle, or secondaiy school, two- or 
four-year college programa, professiooal programs) or as a shorter sequence (Le., a short sequence of 
courses, a iAd^ course, or a unit witUn a course). Before the educational sequence a imaa is an 
applicant or a recruit Afterward they are graduates, drop-ou.^ or fiiihirea. In process, thay are learners. 
The current system is designed to provide teaching and resources to help them make progretw toward 
suocessftil graduation firom that sequence. Thus, hi educational assessment, the following exemplify 
purposes and olgectives bef(Mre, during; and after. 

Purpoees of Meaanremeiifc an Educatianal Sequence 
o For sdection among iqiplicants 
o For idacement of recruits 
>o For guidance services 

- Course of stu4y planning 
' Learner profiling 



Purppeea oif rTuwirwmint During an Bmluatitm gUjq wwM^ 
o Incremental gradmg 

o Incremental fidfing (quizzes and mid-terms along the way) 
o Routing to non-academic and "special" tracks 

> o Measurement services to he^ individual students monitor progress 

- extr^x)late fitture progress 

>o Help 'm formulating the sequence and idapting the sequence to the progress of 
each learner 

> o Learning feedback: Hints and helpa while students grapple wit^ the task 
>o Advice to teachers for grouping students for uutruction or for prqjects 

>o Clarifying the assessment rtandards and makicg these assessment fitandards 
the explicit goals of learning 

Pii rp oe c B of Ifaaa u rsme nt After an BAjp^mA Hpgiir^ 
o Assigning final grades and failures 
o Graduation 
o Certification 
o Licensing 
o Selection for jobs 
o Selection for scholarships, awards, eta 
>o Guidance services 

- Career guidance 

- Vocational counseling 

- Exit counseling 

At the beginning of each list above are buUeted items Co") that represent assessment imposed 
exterMlfy on indiwduato and generalfy used for making high- 

T^J^T^S?^^ Theremainhigitemsineachli8t,de8ignatedlv">o",areitemsthatwewiUcall 
TieJ services. These are typical^ referred to as gusdanw 

graduates or f^hn^ fig^. There is a whote class of help servkses during the ElSSSfi of l«Sff, les^ 
common, but these may be fiur more hnportant hi achieving the natkm's goals. 
Unfortunatety, measurement hi 

imtouched by testfag companies and measurement professionalB. It is an ann left largely to teachorai. 
niis state of afiidrs represents an anomaly hi the field of educational measuremrant Leaders call for 



^.-'liJ"??!,?^^^^^*"**."^^'^",^/*'^ Or«an«, "Tha Sp«ji£lcation and Development o£ 
336 that aunmarlzea purposes for testing) xobxb a on page 
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Mimmstmi th&t will help iini»t)vei not juat bring good or bad newa^ Measurement professionals have 
long soud^t the *lio|y grail" of nuiasurementa that trufy he^ learners and teachers, but the dominant 
professionally dsfvdoped tests continue to be hifl^-stakes tests given before or after an educational 

sequence, Th^ prrwhirtlvrty rftnAMumnfiunt yjftntilrtft i\mm\nping imp|iiip<yn^^g mWMfUltfmffiPt 

instruments that will heln teachers and learners in process is disanoointlng> Most of the attention and 
resources are given to work on extenu% inqposed measures to support hig^-stakes decisions. Thus 
Stigglns was cooLtrained to pdnt out in 1988 that: 

'Amid the whirtpool publicity, politieal turmoil, and scholariy debate currently 
Murrounding 0ie development of a nationally standardized teat, or ttatemde aaaeaamentt, and of 
meaeurement driven inetruetion, we are again failing to address the central issue in school 
assessment: insuring Ihe quality and appropriate use trf teacher-directed assessments student 
achievement used every day in classrooms from co<tst to coasL"' 

Teachers develop the best measurement tools they can, but the models th^ follow derhre firom 
the dominant practices ihey have experienced: assessment with strong consequences before or after 
educational sequences. They have learned norm-referenced testing; so they write tests that qpr«ad 
students out and then grade on the curve. Often thqr use grading as a way of providing feedback to 
learners, but frequent^ they unwitthiglly violate vital standards for validity of an assessment; e.g., they 
assign numbers that do ciot cofTeqMod to 'vdiat is being measured (suc^ 

test score for tardfaieas, as thou^ mathematks achievement and punctuality were on the same 
measurement scale). Thqr make inferences from test scores that are not valid (e.g. giving a low grade on 
a siq>poeed measure of educatumal knovdedge to punish or control behavior). Tbey may be unaware of 
the plethora of posaflt^es for he^ services opened up by computer technology, ndiich can provide 
Gonthiuous measurement 

Gootrasting Sii^i^takea Aaseasmeid and Hdp Services within 

In a his^-stakes assessment, a person or group with Appropriate authority and professional 
credentials makes an intopretation of information about an individual that can hove a miyor in^Mct on 
that indMdu ad's life, llie term "lielp services'* is probabfy not a fiuniliar one to most readers of thu 
nor is it a fiu; liliar term in the ednraitionnl measurement literature. It is a term used in this piqper to refer 
to the use ci edli catinniil a immiw^ nt to guide and help thu leamar in nflfl nnipluihiiig impwtiint ^nftionul 
goals. The before and after uses are familiar ones; guidance and counselling. This guidance nuQr be "high 
stakecT in the sense used herein, if a critical mterpretation and dedsion is made that affects the persosi's 
life, and that dedsion is made without the examinee's consent, but if the course of action is left tq) to the 
indhfidual, it is classified here as a help service. 

Table 1 characterizes the distinction between his^stakes assessment and he^ services for 
individuals. 

mas pohits of distinction are given. In each the acronym TEST (^est • Education Sequence - 
Xest) is used to refer to the high-stakes test (usualty given before or after the sequence), and the term 
Help System is used to refer to a help service that uses measurement continuously to improve the process 
of learning and instruction. 



* Paul H. O'Nalll, Chairman of Praaldant Bush' a Educational Policy Advisory 
CoomittM, and CSBO of Aluminum Company of Ainarlcat "Wa hava aoma good tmttt'n that 
tall uB aomathlng, but thay don't tall ua In a way that %#ould allow ua make 
apaclflo Intarvantlona In tha procaaa... Thay tall ua wa'ra not doing vary wall; 
thay don't auggaat why not." Quotad in, Robart Rothman, "2 Groups Laying Plana 
To pavalop National Bxama", Education Weak , Vol. X, No. 4, (1990 Editorial 
Projacta in Education, Sept. 26, 1990). 

^Richard J. stiggina, "Ravitalizing Claaaroom Aaaeaamaritx The Higbaat 
Inatructional Priority, " Phi Delta Kapoan . (January 1988). pp. 363-368. 
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1 CSumcteriatiui otJMp QyeiesBa \ 


Any meoiurement, whettier a TEST or a Help Syetem, it designed to pnvide relevant, reliable, 1 
fair, and timefy infbrnuOion to pFofsaeional peopU (or to pe(^ 1 
profestionally apprapriate manner) to kelp them make valid and defensible deeisione. \ 


1. WH7: TO BANK STDDENTS OB TO ADVISE THEBf? | 


J. To Rank Them. ThM BEFORE TERT im 
detigned to spread people out as much as 
poaabk along (he score scale. A wide score 
tpread facilitates ranking, and thus pronu^ 
comparisons between those who are higher and 
lower on the tcale. The AFTER TEST is 
ueually designed to spread people out to 
facilitate grading, but in eriterian'refermced 
measurement it is used to determine 'pcsain^ 
at aoms earefuUy established leud. 


1. To Advise Th^m. For a H^Jp FSvi,i^ 1 

ranking is irrelevant, and 'scores' may not even 1 
be visible. What is disolmed to the learners 0 
and teachers is information to help them make 1 
beUer decisions titat will facilitate progress 1 
from one step to the next. 


L WHATISTEBNATDBEOFTHED 




IONS TO BE IfATE? B 


2. mgh-stakeedeeisiane about indiuidunlM. 


7t 

he 

f 


2. Low Stakes Decisions about Leamiru/ I 
Proeress. DednanM ahoui mhich Inrgmw „nit» 
in an educatimal sequence to take next or 
about how to correct and improve within a task, 
have a small impact, and mistaken decisions 
can be corrected quickly. 


These decisions can have a dgnificant impm 
on the future activities and oppokunities oft 

1 teetee, but the decisions are made for them £g 

1 eomeoneelte* 


8. WHOABBTBEFBOFESaiONALD] 




[ONlfAEEBSr 1 


a OmeeraofanIn»tiiutian.Fi»>TBflTmih0 
professumals are admienone officert. State and 
District Administratore holding echoole 
accountable, echool peychologiate, faculty 
groupe considering graduation requirements, 
and the like. They seek defensible infitrmation 
to back sometimes unpopular dedsions. 
Teachers make high stakes grading decisions. 


3. Uamers and Teacher*, h im Am imn^hmr* 1 
themselves, and the more advanced learners \ 
who have internalised the standards of 1 
excellence, who are the professionals. 1 
P^hologists and other professionals cannot 1 
stand by teachers and learners to guide them in \ 
making valid inferences from every 
measurement. 


4. HOW DOES TESTING USUAUiT TAKE PLACE? 


4. Seoarate tesHna sesaiortM. uMunUy ofc^iW.v 
to the flow of learning and teaching, when 
printed test materials are distributed and used 
tMder strict supervision. 


4. Unobtrusive meaaurettutnt Thm mnmm 
materials are used to learn or to produce a 
student product that are used to measure. 1 
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5. WHAT PROFESSIONAL 8TANDAHDS ABE HEQUIBED? 


5. Yen High. Proper uMt of TBSTmnqmrma 
profkuumal knowledge and tiqterienee. It 
nquiree knowledge of prafuMonal ttandarde 
for interpretation and u$e ofteata in general 
(e.g., that TEST tcorea thould not be the aole 
baaie/brahigh-Mtakeadedmon), It aim 
requirea apedfie knowledge about the particular 
measurement instrument and the aituation. 

Proper uae of TESTe may be eacnficed 
far adminUtrative convenience and may be 
given lower priority than the goal of lower 
coata. 


Bigbt Proper use of Help ^sterns will also 
require profe^onal knowledge, particularly 
knowled^ about a measurement standard - 
what it means to be good at something at 
progressive levels of excellence. 

There will emerge standards far the use 
of Help ^stem information which, like TEST 
information, can be misused. For example, it 
will hopefuUy become common knowledge that 
Help System data should not be used in 
grading students or evaluating teachers. 


& HOW 8ECUBE liPST BE THE liEAaDBBMBNT TASim, flfYMlKfl, AWn URDniyrp? \ 


A iJtmoetaeeuritfmuMihemmnhtinmdht 
aaaure that the item$ or taeke, and key, will not 
be known by any teat takers in advance, and 
that the eeorea and report* will not fall into the 
wrong handa. 


6. Free Disclosure. The tanAird» anA ih^ 1 
form of the tasks should be known in advance. 1 
Complex tasks may be practiced as often as 1 
necessary before the high-stakes tests. (The 1 
learner, however, should have the right to keep \ 
oro^re^^ scores private). \ 


7. WHEN liU&r THE DECISION BR MADE CTIMELINESS)? I 


7. Delayed. It ia aeeeptohlm to hmm n gnp nf 
amend weeka or months between the time of 
TESTingandthetimeofdeeiaion-making. The 
deeiaion ia important and ia acheduled in 
advance. 


7. &Bdd. even Immediate. Deamma in/nrnut/I 1 

by a Help System are the day-to-day, minute-by" | 
minute, deeiaiana learners and teachers must 1 
moAe. These dedaiona cannot wait for scoring 1 
and interpretive data from a distant location. I 


a WHEN: FBEQUENC7 OF THE lIBASUBElfENTBWmGH INFORM THE DBCISION&r 1 


8. In/reauenL A mnglm TEST^ rnihmr th»n « 
sequence of measurements, informs most 
high-stakes decisions (grading ia an exception, 
where a sequence of teats and quizzes is often 
used). 


8. Continuous. A Help Symtm pm,nit0M n 1 

continuous sequence of measurements, repeated 1 
cycles of Teach <-> Test (more accurately, \ 
cycles of Practice <-> Coach). Measurement ia 1 
often indistinguishable. | 


1 9. WHEBB DOES SCOBING AND BEPOBTING TAKE PLACE? 1 


j 9. iLoccurs at a central site with famt »t>n^nmr» 
1 and poper processing machines, along with 
\ large computers and an e:q)ert staff. 


9. U teto p/oce in^a decentralized educational 1 
iSMng where the date is immediately available 1 
to teachers and learners. \ 



The rcc omniftndwtion that educational aaaessment practice emphasize help aysteniB in the fttture 
ffsins support from the compariaoDs hi TaUel. IfAmeria's needs have shifted from selection and judglnB 
to miprovmg achievement for a demogniphica% 
that are linked dowity with instructioD, that are designed to advise an^ 

better decisions about leammg progress at the moment of need, are decentralized to the places of leammft 
ai^ are continuous. As in aD assessment practice, high professional standards must be maintained; but 
(MFerent standards will be required, and th<gr wffl have to be adopted by the teachers and the more 
advanced learners. Assessment, in contrast to measurement, requires balanced human judgement. 
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PmpoBca and OlysUives of FMipani Enduatko 

The term program enkmtka refera to the collection of data to guide '^prvrhnB about educational 
sequences, and aspecta (tf them. By contrast, the tern individual aaaeBament refera to aaaesamenta made 
about individual atudenta. The focuaoftbestateoientofwork for this (itu^fwaaoo individual aaaeeanient 
Program evahiatian will be given leaa emphaaia fai thia paper, and thia term will be used to incfaide 
evaluations of individual classes or spedal interventions as well as evaluations of the teachers' 
performances witUn defined, educational aequencea. We also use this term to rtsfer to statewide and 
national assessments, which are used to make interpretive conchisions about the progress and problems 
revealed by measurements taken using nnfinnai and statewide sanqdesu 

A uaeftil distinction between two forma of program evahiatton is aummative and formative 
evaluation. Summative eval u a t io n uauaHy emphaaiiea meaaurgmflntii taken iift«F nn «» Hn <«n t ionfll gequgnce, 
althou^ it niay oooqpare them to measures taken before to ahow gain. The kind of decniona made by 
summative evaluation are to approve a particular i»rogram or terminate it. Formative evahiatkm, on the 
other hand, interprets flfia measurea hi the light of Bggg^ measures and looks for ways to improve the 
process so that the desired outcomes win be achieved 

Summative and formative q>proachea to program evaluation are contrasted in Table 2. 



TUife2 

SoBie Coctrasta Betw een S u n miaiiv e and Foc mttlive 
AppniadiBd to Fro^nam Evatuaftian 



1 SDlOfATIVEFBOGBAM EVALUATION 


1 FOBICA3IVEPlUX3BAli EVALUATION | 


1. Any program eualuation, whether formative or aummative^ it designed to provide 1 
relevant, reliablet fair, and timely information to the appropriate dedsion-mtkera to help them 1 
make valid and defimtible deeieiona about programa, and their teaehera. I 


2.W]iBfcDeoUooiareGharacteflrtieorFrapamBvah»ti<Mir 1 


2. High-atakeadeeinanM about nmortunMitnA 


2. Dedmoas^are made io refine and imoro,3^j 1 


oboutheYDtraonael rolea within nroarama. 


to emphaaix or de-emphaaize particular \ 
program component^ to allocate reaourcea to 1 
reuiae and improve a component, 1 


Dedaiona are made to approve or diaeontinue. 


S-WhoMskeatheDecWanrf | 


3. Adminiatratora with program authority and 
budget control. 


3. Developers, teaehera, and users concerned 1 
with improving the progratn. 1 
Administrators approve funds earmarked for \ 
continuing or special improvements. I 


A.inMBaanikmDeiMamMM | 


4. At infreauentintervalauihgnpmgminMnn> 
targeted far aummative eualuation and rsview. 


4. Ideally, continuoualv in of a 1 
semester or a year. At a minimum, during \ 
the initial trial runs of a pr(^ram. \ 
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5. How are the IfiwwiPBnMnti Tdcatf | 


5. Tl!w mott common iummative meOiad ia to look 
at aaaratated atudent aearea on atandardixd t^ia 
ao that Gompariaona can be made acroaa different 
aehoola. 


S. Fine-grained aconng. both atudent 
aummariea and direct maaaurea of the 
avatem are uaed. MeoMuring iiuHnidued 
ayatem eomponenta makea it poaaible to 
highlight apedfiea that need reuiaion. 


0. WliBTO Do the Bvdofliku TdBB Pteoe? 


6. At muitinie aitea of program apgroHon. G»nmml 
atandardiaed mecuurea are collected that can be 
aummarued acroaa aitea» Reporta are prepared 
centrally. 


6. At multiole aitea. Genand program 
imjvouementa are aought for all. but with | 
hcaliaation and euatomxaatxan. Loea/ 1 
teaehera and uaera play a greater role. \ 



Thrae Mb m i i i wiiw i l "UmA t nO m far Prryni A m« — ■■! 



1. fliimimiririi^y flimgH^ jnlMj^l^ meaaures. Both complete^ olgecthre, 
inachiiie-ecor8i)le measures and holistic measures requiring human judgment will be 
part af a good program assessment 

2. Direct system measures. In a conqmterized learning and testing eiaviromnent, 
direct measures can be obtained, such as the number of lessons passed per student, 
average time, errors, difiEknitty of the lessons, and ^[^miadi to or ara^ 

certain elements within tl v« lessons. Task engagement can be measured direct^. 
Do st u d e n t s eiperience certain kinds of instruction or not? Do they encounter and 
engage critical histnictiQi!!i elements? 

niis is a veiy promising area for future development Other direct system 
measures could inchide tfaiA on task, mean time to help, and eflfecthreness of each 
kind of he^. 

3. Long-term outcome^ Fiiture accomplishments in later classes, higher education, 
and workplace settir-^ -.ften give a different picture than course grades when 
induded in a prograik ifvahiatkm. Selection test scores and grades in initial 
o w'dem i c classes may be higfafy related to one another, but may not be related to 
long-term measures, which after all, are much closer to what our nation needs thwn 
Qrst-year grades. 

A Note oa AsKaang TeadienL States uwi multiple choice tes a with the flavor of verbal 
and mathematical minhnum conipetenty exams in licensing teachers, weedu ig out thow 
a certain cut score. In Hccnstog teachers in this manner, legislators and state administrators have a 
^raima. They can show the publk that they are striving for higher standards by pushing up the cut 
scqje on the tests, but to do often results in the poUtkalty unacceptable consequence by rqecting too 
many prospective teachers from racial and linguistic mhiorities, who did not do as well as whites on the 
verbal and mathematical items. If this cut score is forced downward by these political realities, then 
those accepted into teacher education programs just above this cut score have ahnost as much need as 
those bekm the cut score for remediation and improvemicnt Could not help aystems for the 
professiooal growth of teachers, both preservice and in-service, reduce this dilemma by s 
^ndards while helping more candidate teachers achieve them? Do not formathre evahiation systems 
offer a hopefUl approach when they take direct qrstem measures and focus on how the instruction 
operates? Cooid these systems not take some of the focus off of poorty deEned qualities of teachers 
and focus on system attributes that could be changed by management and by better tools? 
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Summflty of Purpoaw: A Four-Fcid diumfk nihil of Pmpoaee far Wdiinwtifwd McMuwrneBi 

The distiDCtkm between externality fanpooed measurea for his^-stakes dedakma versus help 
services is analogous to the distinction between smrinative evahiation and formative evahiation. Thus, 
a way to wimmitriie the broad purposes for measurement considered in this paper is to consider 
individual assessment separately fitmi program evahiation, and consider both BQg^-Stakes and Help 
purposes for eadL 



Tsbfea 

Fov PurpoM far EducatioDal Aaseasment 



AMesneni to stvport: 


HtfrfkahM .AwBwmBirti 


Hdp Servioea 1 


DecisiaoB abooi inAvidoBls 
INDIVIDUAL ASSKSSMKNT 


Selectioii, Plimmwffit, GradJni^ 
Failiiig; Gnduatinft Licennng; 
Certificatioii, Selection for 
ftwnrds... 


Guidanee, nmirm jJjinnmg; 

Progreas Monitoring; 
Diagnosis, Advice on 
strategies, tactics, 
standards... 


DecHiona ahoiit programs 
PROGRAM EVALUATION 


Summatiye Evahiatkm to 
grade and judge ediiwiHnniil 
programs, and teachers, and to 
bold eduQBtora aooountaUe. 


Formathre Evaluation to 
identify areas of strength, 
areas to improve, and ways to 
in^irove. | 



METHODS FOB MEASURING THE ACHIEVEMENT OF INDIVIDUAUl 



In this sectioa, four methods for educational measurement of hidhdduals win be mtroduced. 
These will form the mi|jor orguiixing framework in both Sections n and HL All four are applicable 
toward any of the fi>ur categories of purpose shown in TaUe 8. The four methods are (1) tests. 
(2) Btandardixed performance tasks. (3) s^y^ SQ^sKts (exhibits), and (4) process measures taken 
during tool use. 

Item Taalalttm tests are made up of test itenis of a fomiHar nature. These usual^ require 
veiy short tiue intervals to complete, and are appropriate for sampling widely and shallow^ from 
information doma i na . The most common items are multiple choice items that have onfy (me correct 
anawer. Item tests meatwre araffoldin g knowledge well; memorised terms, fi^ts, and short procedures 
that can be taught when we "cover* a curricuhmi rather than teaching it for mt^ration and for 
transfer to new situations. 

Slandartfaed Pet&MBanoe Tsska: These standardized tasks require the integration of multiple 

lower level pieces of informm^km, and simple skills, m order to perform integrated and complex 
acthfities, e.g., solve a problem, design and conduct an experiment, write a document or prepare a 
preaentatum or demonstratioa. The task takes mucti longer than a simple test item and has integrity, 
unity, and reference to sodaUy vahied roles that students might find interesting and relevant 
Performance tasks, unlike iteina, need not have a single correct answer. They are scored to reflect 
different paths or sohitkms. Students may he ghren partial credit HoBstk; asseaament is common^ 
used in gradbg performance, thus, human ji Igea rate the standardized esa^r, experiment, documented 
problem sohttioo, etc on a scale that may range up to nine or ten pointa. Different levela of score have 
different meaning; ao that students nu^ be ghren specific feedback on the standards for aqy level 
IdeaHjr, students should be permitted to repeat the performance task until they are satisfied that they 
understand the holistic standard for excellence. 
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Oeatire Producta (Ezliibite). An ezan^)le from athletic competition ilhistratefl the difference 
between a standardixed performance taak and a creative student product The Olympic figure skating 
cooteitant must first perform a set of standardised 'sdiool figures", like figure 8's and junqw. HoUstie 
ratings on a 10 point scale are used, and the ratingB of Bsveral judges are averaged. Then the 
c on testan t presents a fl eu siyl e program. It must meet certain criteria, but the con^xwition, 
choreognmhy, amount of risk, 6^:. are to the contestant. The judges have agreed in advance to a 
set of standards to rate the free-styie programs on the same 10-point scale. The contestants ftilty 
understand the criteria used in the rating system. 

B m mpl fs of student exfaflbits are writing asaig n m e n t s, reports, presentations, performances, 
desigos, artistie productkms, etc. The tasks cannot be standardised, or elsfe the room for creativity is 
diminished. This sort of objective goes beyond integration to enable the assessment of transfer of 
learned knowledge and skill to a new situation, putting together what has been learned in a new way; 
adding new insights not direct^ presented fai the way the )!aaterial was original^ learned. 

Hie result of the student's creathre production is an exhibit, which must also be scored 
holistica% for primaiy traits agreed to b advance. In ideal situations, these primaiy traits are fbllty 
understood by both teacher and student Like the Olympic contestant, they know the difference 
betweenapafarmancethatiBrated8,9or 10. Moreover, the more complete^ the learners 
understand the holistic scoring standard, the more valid that measure becomes. This is m contrast to 
coaching for an item test With an item test, the more we focus on narrow olgecthres and structured 
^pesof questions, the less valid the test becomes. Frederiksen and Collins,' who introduced these 
authors to the Olympic skating eiample, advocate hoUstie scaring of student ezhibita as a method of 
increasing "systemic validity;" that is, the more you teach to it, the more valid ft becomes, whereas 
when you teach to item tests, you teach tricks for guessing and things to memorize to avoid thinking. 
This makes the test less valid for measuring the desired level of cognitive ftmctkming. 



^&e^ ]feanse|i During Tool Uae: Suppose a student uses an outline processor, foIlow(>d by a word 
prooesMV, for writing an essay in response to a creative production assignment Whenever the student 
uses one of these software productivity tools, he or she is interacting with the computer and the 
responses can be scored and faiterpreted. Research and development is needed, and intelligent 
software, to perform this mterpretation and to generate hhits and helpftil advice to help improve the 
student strategies. 

It is not necessaiy to wait unto intelligent scoring programs can be put on-line during tool use. 
Both standardised perfonnance tasks and student exhibits may be utilized now by using holistk: scoring 
sdumies performed by human raters. As teachers and students learn to assign these holistic scores, 
they adiieve an important educatknd ol^jective of understanding in a deep fhshi^ 
excellent The use of computer toob during the devebpment of a creative product fiKalitates^ 
process. The abiUty to print out intermediate products, as with a word processor, and discuss and 
evahiate them is an excellent way to integrate assessment with mstruction aimed at creative 
production otijectives. 

Examples of learning strategies are library research tasks that require searching strategies, 
interviewing and qu est i on i n g technkiues, note-taldng and review techniques, and methods for 
controlBng onotional states such as anxiety and procrastinatkin. Examples of performance strategies 
are found within both academk and athletk: games, and within tasks like writing, speaking, or 
performing. 

SUIOIABY 

inn5***!Sf* ^ ***** evolving purposes, ol^jecthres, and roles of educational assessment in 

the 1990 s. Measurement and assessment were shown to be fundamental to all occupations and 



n.— ^^^^ Collins, "A Systems Approach to Educational 
' Pfff?ftg9hyr> vol. 18, NO. 9 (Washington, DC: American 

Iducational Research Association, Decembar 1939). pp. 27-32. ««w*ri.can 
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profeamooa because dedskm-making is fimdamental to all, and dedaion-making depends on sensitive 
judgments based oo ^ipropriate and accurate information. 

Educ at io n a l measurement plays a peculiar and skewed role compared to measurement in other 
p r o fe erioo s . There is a lack of consensus on ythat should be measured, and measurement imictice is 
skewed toward using the measures for judging; grading; sorting, and selectmg. The current 
measurement tedmolo^ies have grown out of the suoceasftil use of aptitude tests for sorting and 
selecting. However, ({rading practices and testing for accountability in achievement measurement 
resemble these t^jtitude testktg practices too stroni^. Thus, a case was made for the devdopment of 
a new fSeunify of measurement sp^&eajikKtA called help systemu zoore mtegrated with instruction, and 
geared toward achievement throu^ leamiiig pf ogrc as for a demogn^>hical^ diverse student 
population. Such a devriopment will better serve current natitmal needs, ^diich are poorly served by 
finding better methods for selecting and possing judgments. 

Program evahiatioo, fairhidfaig the assessment of teachers, also lypears to be skewed toward 
judgisg more than hdping. It could profit from mudi greater focus on measurement integrated with 
instruction for providing hdp to improve the teachers and their programs. 

Sectioo I introduced a distinctioa between measurement and assessment. Measurement is a 
vital process of deciding yibai attnbutes to measure, and then providing accurate data so that dedsion- 
makerscandeterminepresenceor absence, and more or leas of the attributes selected. Educational 
awie i wm>nt uses measures, fi^ other information, about individuals or programs to make an 
interpretation and then a decision. Assessment requires human judgment that goes beyond the 
accurate measurement of some attribute. 

Human judgment is required in three of the four measurement methods discussed in section EL 

1. Objective scored item tests 

2. HoHsticaHy scored standardized performance tasks 

3. HoUsticalliy scored student products (exhibits) 

4. Human judgments about intermediate products in the process of developing a 
student exhibit, on performing a task. 

None of these measurement methods are new. Standardized performance tasks have been a part of 
individuaify admmistered tests for intelligence and dinkal diagnosis for many decades, but they require 
highly trained professioiiate to administer the tasks, rate the responses, and interpret the resulting 
profiles. Both standardized written aawgnmrnts and naaigninmts for creative writing are common, but 
also require holistic grading. Good teadiers of writing, speakins^ presenting, acting, performing, 
athletfe coaching; and crafts judge intermediate products or performances and provide hints or helps 
along the way. 

How have computerized methods been used in admhiistering each of the four measurement 
methods? Can aKi^niterized methods go beyond substituting or incremental^ improving item tests for 
high-stakes purposes, and beyond the conventional uses of item teats for selecting and judging? Can 
oomputers transform asseasLiient by partial^ automating the scoring of standardized tasks, student 
products, and by providing hints and helps during process? The latter contribution would be 
transfcnroatiooaL In addition, the computer's role in introducing a much closer integration between 
instruction and assessment would be transf ormationaL How much promise do computers have for 
transformational iq)plicati(MM? 

In the next section, these and other issues will be addressed. We will show that computers 
have so far been used primarity to substitute for and incrementally Improve item tests. But this paper 
wiU show that computoized educational assessment offers great promise for introducing the help 
qrstems and the formative evahiation sjrstems that will enable teachers to develop professionally in 
enhanced roles as assessors and managers; students to achieve higher, more complex achievement 
objectivea than are measured by conventional tests, and educational systems to evolve through 
measurement and formative evaluation toward the kinds of productive systems our nation needs. 
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SBCnON IL CDBRENT USES OF OOMPOTI^ IN iUSNSESail^ 
TYPES OF CX)MPUTEBIZED MEASUBEMENT SYSTEMS 

The tenn Computerized Educational Aaseflament System (CEA System) wiU be used in this paper 
to refer to a aiystem that uses coaxgutm in some part of the administration process for an educational 
measurement instrument Computerized qmtems for administering tests can be grouped into six 
categories, depending upon the mode of administration used AT. of the six CEA System types use 
computers in the processes of scoring and reportmg of results. The six CEA Systems are: 

PortafalSk NoD-Intenctife lamtnt lledh 

1) Scanned answer sheet dystema 

2) Portable keypad flystems 

3) Bar code readers 
IntenctireTea^iiv Station 

4) Conqyuter work stations 

o Learning stations 

> in Classroom (6rotq> Dioplq^) 

> in Chister (Individual or Small Group Use) 
0 Specialized lab work stations 

6) Customized simulator environmiiints 
6) Specialized notebook conqniter systems 

The six CEA System types are grouped into two categories: Portable Non-Interactive Answer Media and 
Interactive Testing Stations. Answtsr sheet i^ystems do not use computers to present the displays or accept 
the responses, but do use computers to fican the answer sheets, score them, and pk^t out reports. This 
jjdety used testing technology eiists because of computer technok)gy and paper handling technologies. 

Furthermore, its portabi% is one of its hnportant assets, since testmg can occur in aiy room Witt 
miitaUe desks or tables. Portable keypad systems and barcode readers that do not interact can substitute 
ftr the answer sheets and the scanners. The responses go direct^ into a digital form without the 
intermediate scanning step. The barcode readers are current^ being used to aid in the process of holistic 
scoring of student essqys. Maqy n ational and state testing programs now incorporate student essiys which 
must be graded by two or more h'jman assessors. The barcode readers he^> manage the data and aid in 
rapid dedslons to add a third reader if the first two ratings diverge wide^. 

i^iL^ computer workstations and customized simulntioo environments are not portable, so a new 
infirastructure must be set up consisting of local rooms equipped with a sufficient number of computer 
workstatknis or shnulators. Schools hove learning centers with workstations that can be used for 

mstniction or for measurement Interconnection to a central file server is desirable to collect the records 
firom each student's testing session. 

Hie notebook computer iqrstem is so small and port able that it has the potential to enable interactive 
tesUng to compete with scanned answer slusets. When fiilty devetoped for computerized mear cement 
such systems would also replace the portable keypad .lystems and barcode readers. 

BaBentialPlroeeaaeeinAdminiBteringan F . dirarttml M c aB u rema ^ 

Conridering the six processes of test administration listed bekwr reveals aom«. of th« vi,rUrin«, ,»,^ne 

tf-t^irttoftandmstatisticdana^y^ Hiese are discuwS 

Drtefly later in this seclion. 

, «. . . f} Pr^^ Itm 9r TMt I)j<n>^T SI Three methods are m use. portable displays for each 
^^t^ rf^^ inwracti^e computer dispM used oneH)n-one, and group displays 

^ Obtaining a Record of Res ^^nu^-. The most common method is to use the scannable answer 
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sheet It is most often used in an individually paced mode. That is, the students are givec a certain 
ammmt of time to wmi: a certain nuniber of itenis that they can sequence in any mannv and arrange tb 
time as they wish. When the items are presented in a groiq> mode, for *»«»™pi*>, wlien an auditory 
listening test is given in language aaaeeem^^ students are all given the same amount of time to complete 
eiichitem. Interactive ooo^Miters in labs or learning centers are indhdduallly paced and u^ 
reqxxise entzy devices (ke^xMrd, mouse, joystick, touch screen, vcice pickiq)). Notebook computers are 
portable and would knd the m sd v ea either to individual podng or to groiq) pacing. 

21 Scoring and renortiniK Getting the responses into the computer and providing a scoring key 
in the con^mter is the challenge. Scanners for answer sheets have made mass group testing possible. 
They can and should be improved to go beyond multiple choice items. Developing a scoring key for 
cQoq)lex performance tasks, even when aU responses are collected by computer, is no easy task. Once 
scored, software for generating a variety of reports is availaUe and is quite mature. 

il Interpretation of results for individuals; Conventional answer sheet systems are scored 
(Agectively and require no interpretatioa duting the scoring. This has a cost advantage and removes the 
bias from subjective rating; since human graders are not immune to bias or to differhig interpretations. 
Interpretation of the results coires after the answer sheet is sent back to be scanned, scored, and the 
results printed out On some clinical tests, romptiterixed tools are now in use to interpret profiles of scores 
from psychological instruments that produce a profile The best of these are "expert systems" that have 
cq>tin«d the rules expert assessors use to interpret profiles. 

Individiially adminiateiwd teata of intoffigBnga imA nf piiyi»hftlngL..il /««gn^ii Fagi i W uKrtiinf iwl h immn 

assessment throu(^tout When tests are individual^ administered, the expert himian administrator judges 
and scares each performance for each task. Sometimes the score is olgectivel(ydeternimed, but more often 
the student's vocalisations or movements must be interpreted. 

fil Monitorinaft^y^yyniynffj TTii^nti pi^.^^ m.f>«{«^ p^.i...>^|^|nj^„^^„ 
to assure s ta n da r disa t ion and fitimess, and to deter dieating. Computer methods have been used in 
compu t erised testii^; centers: video cameras, time control, alarm li^ts, etc Monitoring for overall test 
security goes beyond session monitoring. The security of test booklets and answer keys must be 
maintaine d during devetopment and distribution, as weH Electronic distribution provides new means, such 
as enoyptioo, to sohre this old security problem in high-stakes testing. It also provides new risks because 
it offers those so hidfaied new ways to gain access to the items and the keys, and the possibility for 
tampering with the scores. 

fil Special nraetfces for soedalpopulatinnK Handicapped people ti.^ 
mqy be ghren extra time. Hie visuaQy impaired mqyreqture a human reader. Con^Mtters, le. educational 
testing stations, can be equipped with special response devices (audio, brailie keyboards, unconventional 
response devices for pfay^caQy handicapped people, etc). WlDingbam' has published a comprehenshre 
vohmie on issues in testing handicapped people with conventional teats. 

More work is needed to integrate the creative engineering for computer interfiaces for handka^ped 
people into assessment practices. Interactive computer displays can blow up printed information, provide 

headphones and vohmie contrds for audto, and can proride a variety of special response devices for thoM 
who are j^qmicalty handicapped and cannot use a keyboard or mouse. Computers can also control time 
intervals very precise^, and it is possible that a much more equitable w^ to a4iust the timing for 
handicapped people can be determined. For example, it takes blind students tonger to read a passage in 
Braille than sighted students take to read the text Perfaapa the computer could determine when the 
student h a d finishe d reading and then provide equal time for answering. QyeztensioQ, it might be possible 
to provide different amounts of time equitabty for test takers from different language groupa as 
Braille is more time^onficmiing than English text reading an English text is more timeH»nsuming for a 



*Warren WDhngham at aL. T^ma T^pplninH m^^^, i>#a. Al^yn and Bacon, Inc., 1988). 

These authors considered students who were learning disabled: hearing impaired, visual^ hnpaired, and 
phyaicalty handkapped. Their work was sponsored by the College Board Educational Testing Servkse, and 
their Graduate Records ExaminaUons Board, so the research was conducted with paper and pencil and 
multiple choice and standardized essi^ tests. 
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native Spanish-speaker than for a native English-speaker. 
ADVANTAGES AND DIFFICULTIES OP ANSWER SHEET SYSTEMS 



Interactive testing stations have several dianenges if tliqr are to coiiq)et<') fo^^ 
predominant use of answer sheet agmtems. Consider ttte foUovving strengths of answer sheet aiystems and 
item testa. 

1. Helativeiv low coat adminiatration. aeoriny. ffni\ xfjp^mg Many gradations of scanning and 
pcoceasing equipment are available, ao each appMcadop can find a cMt-cflfentiw impli»mATit^^tiftn 
Processing lari^ volumes of tests offers economies of scale. 

2. PortabiKtv. UsabOiiy in many locations and settings. 

5. WiA^imrmmA fmn^rH^ acceptance. The pubUc IS &miliar with teat booUeta aod answer 
aheeta and aocepta them with equanimity. Studies with new item types that require people to 
thhik, rather than eliminate and. if neceaaaiy, gueas, reveal that thinkbg itema ar^ 
Studenta agree that open-ended items that do not give the answer anvmg a s«t of distractors are 
"probata more valid*. Still, they find them much harder, more time consunuDg, and they dislike 

4. Coverage. Because eadi item takes such a short time, much more content and variatiuna m 
cognitive demand can be covered in the time available for testing. 

6. jkllffVili^t Related to the larger number of items that can be taken in a given time period, the 
acorea are more reliable than with testa consisting of fewer items. 

6. Predictive VaUditv. The admissions tests predict firat-year grades. 

7- An easting infraatmcturc both tiK»linn|^^gical and human. From central sites to the netwr-.k of 
part-time test admhiistrators, conventiooal testa have a weU-established faifraatructure that is n^^ 
kdanger of being supplanted soon. Capitalizing new hardware for each school ia not enough. 
It is also neceesaiy to train the larger group of test developers, test statisticians, teachers and 
other users how to use the equipment and software effectivelty. 



oflkem Teat/Answer Sheet a^patenu 

Among the many criticism of the common testing format are: 

1) That our testa rety on many brief and unconnected items, and thus measure onty the temporaiy 
tMdstence of snippets of knowledge. Instead, students need to develop an integrated personal 

knowledge structure that can provide the learner wiUi an organized, powerful, generative a^ 
form of knovdedge uaefiil in adapting to fthwng in g circumstances. 

2) ^>t out tests measure less hnportant outcomes and miss many things (^obvtous inq)ortance to 
success hi an increasing^ technok>gical society - such things as critical thinking and reaaoomg; 
proUem-aohrinR trouble-shooting, flexibiHty, creativity, moti-,'ation, persistence, and atratesiej for 
leamkig and self-controL 

8) That the dominant objective item formate, in particular, multiple- choice promote strategies that 
have nothing to do with important educational outcomes - strategies of elimmation, tmessuuL time 
allocation. ^ 
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4) Thai foeuaaed preporatioa for a broad coDec^^ 

formats takea time that should be spent on attaining the deep and powerful kno^edge and thinking 
aUDs needed to be a coQtributor after ■<'h'wtH«g 

6) That the uae ofnerrow testa of mfnimnma focuaaea instruction and squeezes out the more diflScult 
but desirable outcomes. 

6) That the testa are artificial, indirect, and not given in an interesting; integrated, real-world 
context. 

7) That the teK.- are btaaed toward certain ethnic and gender groups. 

8) That the itema have but one correct answer, pcntragring education la a seareh for a collection jif 
little ri^t answers, rathor than as a ymcxm of forming one's own questions and evaluating several 
partialty ri^ alternatives to cmnplei problems. 

Mfnimum competeogr testhig; considered in the next section, haa come in for q)edal criticism. 
Mnmmmi Conmetencv Teating and Ha Effiaeta on What, How imri Wtwwn T Wh 

Varying federal, state and district polides for mandated achievement testing using staiu^^ 
referenced and criterion-referenced tests have tended to narrow the curriculum^ eocouraging teachers to 
ieaeh to the tett. Aa areaultf daMmmn t«wrfwi'« dawn^ m wgnMii»wnt pnrtinn ^ thn» in ipmt propumtif m 
and practice testing prior to the mandated state and/or national tests. This narrowing of the curricuhmi 
haa focusaed daaaroom instnactioa oa an outdated notioa of baafc akills (basic readk« and ari^ 
haa left little time for teaching adenoe and other complex autgecta, nor tin^e for advanced higher- 
order thinking; reasoning; and problem solving MXim^ 

Qeariy, there is a direct relatiooah^ betw een what we test and what we teach. Two nu^wWr^m 
coocemfaig a iweMment programs summarize the r riatlnHHhip sucdnetly: "Wh&t you teat ia what you get* 
and "What you don't teat you don't get'^° Mehrens and Kaminiald have written coocemhig the issue 
ofho^cloaebr teachers should teach to a test They djacuaa a continuum of teaching to the test and 
recommend that the point where ooA crosses from appropriate to inappropriate meChoda depends ou the 
infaraoea the teachers wiahea to make from the test scores. If the teacher is interested in inferring to 

a largsr domain of knowledge, then it is inappropriate to teach narrow^ and q)e^calty to 
wcwffoMing otviectives drawn from that larger domain. 

Standa r dis e d item testa have also significant^ effected how we teach (and how we think about 

teaching). Since standardised items are generally quite short, and have one and onfy one correct anawer, 

we teach students that knowledge can be broken down into short, simple addithre components. Wemi^ 



I^auren B. Reanick, "Teata aa standards o£ Achiavamant in Schoola," Papar 
preaented at the 1989 Sduoational Teating service Invitational Conference 
Proceedii.S[a, The Uaee o£ standardized Teata in American Education, New York. 
Ootooer 28, 1989 

Lauren B. Reanick and D. P. Reanick, "Aaaeaaing the Thinking Curriculum: New 
Toole for Mucational Reform," In Bernard R. oifford and N. c. O'Connora (Bda.) 

April?"989^re*^*4!f9^ Bducational T^^AmruhLr.^ 

"William A. Mehrena and John Kaminiaki, -Methoda for Improving Standardized 
Teet Scoreat Fruitful, Pruitleaa, or Fraudulent?-, Bducational MeaBuraaiiin?? 
laauea and Pr»9»49f ^ Spring, 1989, pp. 14-30. Vff^*9^ ^4. f^a^u^m^^^ 
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not teach atudraita that the sohition and problem solving path we use is as important aa the answer. 
Likewise, we have not taught students that there may be several alternative correct answers. In our 
attempts to cover the brood acope of material addressed in the state curriculum plans, we have cot spent 
sufiBdent time on the "powerftil ideas' and the 'core, essential concepts' of the di8cq)linefl. If achievement 
tests are the operational gorJ^ they offer a wB|y to select aiiJ sarrow to a smaller target 

We need to tesdi a smaller number of powerM ideas weD, rather than a broa^ 
without a coherent structure. Furthermore^ we have tended to view the content disripHnee of 
mathematics, reading; writing; and science as separate and independent knotdedge UnnuAnm We do not 
teach students to understand the rflationshtpe within and among theee kno>dedge domains. We typicality 
do nnt have students write or speak about thidr own mwthpmHtM^I ideas, or read original source materiato 
about great mathematicians. 

FlnaQy, bgr advocating the use of standardized, item-based tests, the minimum competent^ movement 
has influenced whom we teadL For ezan^ile, results on standardized achievement tests are general^ used 
as a prfanaiy indicator for retaining students at their current grade levd rather than providing app« 
instruction and remediatioo so these students can progress to the next grade with their peers. 

Arecent three-year stuify by the National Commission on Testing and PuUic Policy, entitled "From 
Gatekeeper to Gatemy: IVansforming Testing in America",' charged that the American test^ 
has become a "hostile gatekeeper" which limits opportunities for many students, particular^ women and 
minorities. The commission called for a innovative, transformed assessment system which would 'open the 
gates of opportunity for America's dhrerse people." 

Stqwrintendents often demand that princ^Mb and teachers "raise test soorra 
goal It is not emphawBiwi that the teat scores are only pwadea or mAmttM- hfhimnm fiw ttiA iikiI ^Mt ^ in g 
o ut comes which we eiqpect from sdiools. 

As Shepard^^ pobts out, teadiers nmy dbent students tmay from good instruction ^tben "«i"g a 
standardised test to idcataymikihandiraipH. The results from these tests significantly harm students fay 
labdHngthenL The label becomes the explanatinn for the observed behavior, "He cannot read because he 
is learning disabled". Then they are redirected into less rhunmigmg classes, with lower expectations, and 
vdiere there is less teadier encouragement and pressure for learning progress. Shepard also discusses the 
effects of errors of measurement In one stu^y nearl^y half of the students labelled as learning disabled 
were realty normal or were average performing students in above-average performing classes or schools. 

aUBtfiTlU'llVE, INCSEMENTAL, AND TSANSFOBMATIONAL APPLICATIONS OF INTERACnVB 
TESTING STATIONS 



Three Stages in Tedmologf 

The U.S. OfiBce of Technology Assessment has found that the diffusion of a technology goes through 
three typical phases. These phases are: 

11 The suhetitut}^ JflmTi The newest technology is used as a more efficient substitute for the 
older manual or labor-intensive procedures. For example, the first applications of computers to 
as s essment were simple computerized tests that duplicated the exact items hi the exact item 
sequence, and used the same scoring procedures of their printed test equivalents. 

21 The stage of incremental Hanrovementa; Working with thu imp\mnum*j,tinn »f »^h»ninjg> 
substitution Itfvel, inventive people soon discover incremental unprovements that utilize features 
of the technology not used in its substitution phase. Examples will be given of tests where items 



"Matlonal Comniasion on Tasting and Public Policy, From Gatak— per to 
Oatawavt granaforming Tasting i n Amariea . (Chestnut Hill, MAt National ComiVs ion 
on Taating and Public Policy, 1990). 

.^i^"^^ Shapard, -Identification of Mild Handicapj" In Robert L. Linn 
Hf ^TMr*^"*- f "^h T'* 1 *• 1-1 (Mew York, NY: Maomillan Publishing 
Company, 1989) pp. 545-572. ^ 
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are selected ^ynamicaHy aecording to a mnthfiniitiral model Another increments iinprovement 
stffl making an impact is the move from the monochromatfc printed content of mort 

21 The mtooduetion of new aPDrtMchai tn iiiHvminii«f||ny fiff^^^mmti^l Inherent features of 
the technology that were doimant during the subsUtution and faMvementd improves 
areintroduced in this phase. The goal of the original activity is reconceptuaHzed at a deeper 



Use of Interactive Testiiv Statku to Adminbter Item "Ma 

The second edition of Educational Measurement (1971) inchides some earfy references to the potential 
onosto«^ computer". These original computerised testing applkations typically employed mahifram^ 
uMln^icomputer systems aoceosedliy computer terminals WidMpreaduseofcomputenfortestinirwas 
5pificant^y spimed liy the advent of the integn^ 
first personal microcomputers in the late 1970's. 

Tbl>ica^y,computeri«edteetinginvohres the conversion or translatiimrf 
testate a a)^ut«vadm^^ 

Table 6 highHghts the mi^or benefits and limitations of conqm 
p^MT'and-pencil tests. 



TriifeS 

Benefita andlimitatiiXM of Gomputeriied Teatnv 



Tedmalqgf Benefits 

0 Greater standardization of 

test administratioa 
0 Inunediate test presentation 
o Imm«wtiate test scoring and 

reporting 

o Enriched displsy and response 

capabilities 
0 ADowB new item types and 

item formats 
0 BeductioDS of certain types 

of measurement error 
0 Ability to measure response 

latecqr for items and 

oomponents 
o Improved capabilities for 

acore analysis and interpretation 
o Improvemente in test secur.tly 

0 Eaay aggregation of testing records 
tests. 

o Creadon of customized tests and 
items by computer 



Tbcfauiogy limitatinw 

0 limited number of computer 

termfaials per school 
0 Inrmmpatible computer 

hardware and software 
0 Need for equating studies 

between aanputerized and 

paperadministered tests 
0 limited conqniter ezperience 

of somestudente 
0 Need to equity, 

bias, and legal issues 
0 Possibility of new types 

of measurement error 

onm computerized testing 
0 Lack of imaginative item 

types when multiple choice 

format is copied fitnn paper 
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Besearch reviews have examined ctMnparability of teat acorea from coD^mterized testa and paper ani\ 
pencfl tests," Tlwse studies typical^ show no aigoificant differences or onfy sliest test score difi'erencea 
in ftnrar of one or the other testing mode. These differences are of little practical signifkamce. Recent 
reviews hypothesize that these small mean score differences may be due to apedfic user characteristics 
of a small portion of examinees.^' These user characteristics mi^ affect performance negativelty on 
cooqmter'adminiatered tests more than the paper*adniinistered tests fw a small portion of ezamiii^eL* 

RuiiiHiIbb of CanixiteriiedTeatn 

The cooqmter administered cooveutional test (CT) is the most widespread of the types of 
computerixed aaaessmenta. CofiqMiterixed tests are en4>loyed to measure genoalized achievement, to 
dias^ioM skills and learning capidnlities, personality cbaracteristica, learning iqitituides, and mastery of 
insti'uctiuial dtjeethm 

An Qhistrathre computerized test is the WIGAT Coaq)rehenaive Assessment Test^^ This 
cwnputerized testing product indud^ mmapfehBstaim toid« rfvmi^Aing, nuM^n nntVv, «^nd Ungimy nrtfl far 
^sdBaK-8. The cnm p iiteriMd taata maamm a nnmnwn nmt nf Mh^ti^} ^f^>^^ addreBBffd by an of 
the nugcrstandardized achievement tests. A testing management system is provided for sheeting students 
and tests, for sequencing the testa, administering the testa, and generating computerized testis 
llM computerised test items faKhide text, graphics, and digkized voice qualilyw DirectionafortakiUg 
the test are given with text, graphics, and digitized voice-quality audio. 

Educational Testing Service has developed computerized versions of two College-Level Examination 
Program (CLEP) testa and in current^ conducting research to verify comperabiltty of scores from the 
computerized and paper-and-pendl tests. One of the CLEP tests employs digitized, photographic 
iOustirationa to test artistic .judgment^* Educational Testing Service haa also developed interactive 
ass es sm e n t videodisc demons)tratiai projecta for medical certificatinn «nH TgngKoti m « ^Sa^^ t jin gmig i^ 

District-wide fanplementntions of conqmterized testing have been demonstrated for measuring state 
ass es sment otiijectives.^'' Widespread use of computerized professional certifkation tests haa also been 



P' Victor Buiidaraon at al., "Tha Pour Ganarationa o£ Computarizad 
Educational Maasuramsnt," In Robax± Linn (Bd.) gducational Maaaurament. Third 
aslltlan (Naw York, Meaailan Publishing Coni»any, 1989 ) . T^^^^MFWft^r \nm 

John Massao and Anna L. Harvay "Tha Bquivalanca of Scoraa fron AutooMtad 
and Conventional Varaiona o£ Bducational and Paychological Taatat A Raviaw o£ 
tha Litaratura," Raiiaarch Report No. CBR 87-8, BTS RR 88-21. (Princeton, NJ: 
Sduoational Taating Sarvica). 

»^ i 1* Stavan L. wiaa and Barbara S. Plaka, "Raaaarch on tha Bffocta o£ 
Adminlataring Taata via Cooputar, " Bducational Meaauramantt laauea and Practieaa . 
vol. 8, no. 3, Pall 1989, pp. 5-10. 

"Wicat Systems. Wicat Comprehensive Assessment Teat. (Wicat Systems, Orem, UT, 1990). 

!*i ''^J-^^*" p" "BTS Innovations in Aaaesament," (Princeton, NJ: 

Bducational Taating Sarvica, 1990). *# 

" °1 Randy Bannatt, and S. Swinton, Design o£ an Interactive 

Aaaaaamant Videodisc Damonatration Project (Princeton, NJx Bducational Testing 
sarvica, 1986). 

tf.i- °* ^i""!!' ^f' Charlaa Price, Mika Strozaaki, and Idolina 

?'?i*^J?**'^°"' Validation o£ a Caaputarizad Teat for 
2? sSliS! 199o!^?Cloy Meaauramantt laauaa and Pr-MM-Ar,^. vol. 9, no. 
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demonstrated.'^ 

C nmpnfafi ^ D M e d P^ycbobgical Teeting and Iptcipi etaUm 

An emerging area Qfcomputerixed testing research and implementation is computeraedp^ychatogical 
tests and interpretations. Computerized testing versiona have been developed for several peychologicnl 
tests sudi as the MLinesota Multiphasic Personality Inveatoiy, Ohio Vocational Interest Survey, Self- 
Directed Search, Functional SkOb Screening Inventory, Blarital Acfjustment and cognithre style tests. 

Computer Baaed Test Interpretations (GBTO are also provided for severd psychology 
and vocational related tests. Computerised test interpretations provide detailed text and graphics based 
reports which interpret the results of psychological tests. } . ofessional guidelines hove been adopted by 
the A m eri can Pqrchological Associa t ion oonceming computer based p^jrcliiilogical tests and test 
hiterpretations.^' 

INCBEMENTAL lUPBOVEMENTS IN COMPUTER-ADMINISTERED TESZa. 
1. CanpweriiedAdBpuvB Testily; 

Beckase notes that, "Adaptive testing; the d|ynamic selection of items to match the performance of 
an examinee during the atrminiatration of a test, has finaQjr become a readify accessible methodology for 
use hi standardixrl testing programs"." This form of testing has achieved a milestone with the 
pubUcatiao of a comprehensive "Primer"'^ that presents in terms as accessible to h^ readers as possible 
the significant boiefits and stitrng tfichniral foundations behind this hiqMrtant mnovation in testing. 

A oonqniterized adapthre test is a onqimterized test hi whkh the next item or task is ad^)twe or 
taikved depending on the examinee's previous responses. In a computerized adaptive test, an item of 
average difficulty is adrnmisterisd first If the exammee answers the item correct^, a more difficult item 
ispresented. If the examinee answers the item uioorrect^, a less difficult item is presented. Theadaptive 
testing process continues until a specified stoppmg rule is reached and the testing process terminates. 
Tbrical adaptive test termmation criteria mdude a fixed number of test items, a muumum standard error, 
or a maiffanimi information vahw. 

Computerized adaptive testhig is based on pioneering devetopments m item response theory." 



iw. Computerized Assessment in Texas (Prove, 

0T» Haterford Tenting Center, 1986). 

«^ '^^f^^^J^' Bugbee, Jr., "Students Prefer Computer Administered Testing," 
Bducational Meaeuremenl-: Issuaa and Pr«c<:le«. vol. 8, no. 4, Winter 1939, p. 28. 

. . « '*T5f°*" psychological Aaaociation, Committee on Professional Standards 
and committee on Psychological Testa and Assessment, Guidelines for Computer 
AllSSiation* i'5*^gjf"^*'P"^*^^o«« (Washington, D.C.: American Psychological 

i,A S* -Adaptive Teating: The Evolution of a Good Idea." 

Wug^t4onal Meaaurewentt Isauea >nd yy^ yj-^^^^, vol. 8, no. 3, Pall 1989, pp. U- 

" Howard Wahier, (Ed.) Computerized Adaotive Testing; A Primer 
Erlbaum Assoc, Hillsdale, N J., 1000. 

lfvohlLr7^L\in^\t' ^PfV'=*^^o«" 5' Response Theory to Practical Testing 
froDiems (Hillsdale, NJ: Lawrence Br Ibaum Associates, 1980). 

Ronald K. Hambleton, "Principles and Selected Applications of item Response 



ERIC 



30 



CtmonfriMd KOacatlonul AnMntmmnt Smatlon ITs Currant nmmm ^. . .omam 21 

Computeriud ada|-'Sr« tests include four nugor components: a pool of test items from which the test ;s 
created, a proced .ior selestkig items frjm the pod, a method for conq>uting the test score wlien tiie 
test is completed, and a means for determining when the testing should he terminated. Additional 
hiformatioii on the com p onen t s and standards for cooqniterised ad^>tive tests have been documented.'* 
Cooqmterized adapthre tests yidd all of the benefits presented above for craiputerized tests. In 
additioiv computerized edeptivit toiitii aho pmvidft tha fdlowtng hwniifitu tmA KmHjif lnn« pfHa^^ntH in Tftblft 
10. 

Tsblee 

Benegta and TJmlial hs w of ConqiuteriaedAdaptwe Testa 



leGnnoiogjr uensnt J 

o Increased measurement precision 

with significantly fewer items 
o Tests/items are hidividualfy 

sdeeted aocordfaifl^ to each 



o 
o 



Locreaoed testing efiBdency 
with time savings of 60% to 70% 

and accurate measures 
at an ability levels 
Inqvovements in test security 
Ideal for ranking and grading. 
Spreads people out along a 
ipj iij^ft dimoiuioio* 



o Lunitednumber of computer 
terminals in schools 

o Requires large numbers of 
student responses for item 
calibration and analysis 

o Unidhnensional item response 
and scaling model does riOt 
necessarify reflect stagpa 
of conqdex cognitive growth 

6 Schoob are not structured 
to take advantsge of the time 
savings firtmi adaptr/e tests 

o Bequires more advanced test 
interinretation skiUs 



F i w i n[ ili a of r^snp iit iro ed Ad^jitiroTeating, One representative ezan^ile of conqntterized adaptive 
testing is the CoUegs Board Computerised Placement Tests devetoped joinU/ fay the College Board and 
Educational Testing Service.'^ Hie Computerized Placement Tests are computerized adaptive tests 
designed for use by two- and four-year colleges to assess If entering students are rea^y for cdlege level 
woric in En^ish, readfag; and mathematirs, or need additional devetopmental courses. These tesi£ have 
been used for a period of four to five years at approzimat -^ 80 colleges across the 

An additional eiample of computerized adapthre testing is the Differential Aptitude Tests, 
Computerized Adapthre Edition published by Psychological Corpoiatioa^' This test batt^ provides 



Theory" In Robort L. Linn (ad. ) Educational Meaauramarit, Third Edition (New York, 
NYt Nacmillan Publiahing Company, 1989) pp. 147-200. 

Ronald K. Hanblaton and Hariharan Swaminathan, Item Reaponao Thaory: 
Principlaa and Applications (Boston, MAt Kluwor Academic Publiahora, 1985). 

„ , !* ^^—^» Darrall B. Bock, Lloyd 0. Humphreys, Robort Linn, and 

Mark 0. Rackaso, 

"Technical Ouidolinaa for Aaaasaing Computarizad Adaptive Testa, " Journal 
9f gdMgftt^tPft^t Measurement, vol. 21, no. 4, Winter 1984, pp. 347-360. 

«.--«.4r^*T''l J"®^^ Computerized Placement Testa: A Revolution in 

Teating Inatruments. (New York, NYt College Board). 

'•The Psychological Corporation (1986). Differential Aptitude Teats, 
Computerized Adaptive Testing Edition. (San Antonio, TX: The Psychological 
Corporation, Harcourt, Brace Jovonovich, inc.) 

League for innovation in the Community Colleges, Computerized Adaptive Testing: 
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computerixed adaptive tests for the following aptitudes: verbal reasoning, numerical ability, abstract 
reasoning; derical ^peed and accuracy, space relationa, spelling, and language uaage. This OHxqmterized 
adapthre testing batteiyiiavailabte in both IBM PC and Apple nvei^^ The test is used in junior and 
seniar high schools. 

With tha emergence of microconqmters, conqntterixed adaptive testing has novr become fe^jible for 
tsidespread operation and fanpleme n t a t i o n; as wdl as for research. Within the pest few years, a wide 
variety of computoized adaptive testing systems have been developed, dffmftnirtr**H ftfid iinpUimm»^ 
by organisations induding: American Institutes for Research, American College Testing Program, 
Assessment Systems Cor^mtioQ, Ed u ca t iooal Testing Sendee^ Pajychological Corporation, and 
Systems. Computerized adaptive testing applkationa are present^ available for achievement testing, 
aptitude testing, compensatory ediKntinn sf laction testa, conege entranca ami pluMwwwt t#>*ti», prrtftHMH/wi 

certificatioii, and Ueensure tests.'' Conqniterized adaptive testa hove also been developed for district 
and statewide assessment^" 



In com p a ri son to coinputerized adaptive testing; whkA attempts to obtain aecuiate measurement ac^ 
a broad range of prafideii<7 levels, computerised masteiy tests se^ 

the cut score or dednoa point vdikh separates masters firom non-masters. The computorized masteiy 
test presents items irfiich he^ to discriminate examinees above and bdow the mastery cut score. 
Computerized mastery testing is the preferred model of choice for most certification and licensing 
programs. The theory and procedures for con^mterized mastery tests have been documented.^^ 

Bnmjiies viCaaiSHMaf^ llfsstfirj Tests 

Educationa! Testing Service has developed a computerized masteiy test for the National Council of 



Tha state o£ the Art In Aaaassment at Three Community Colleges. (Laguna Hills, 
CAt Laagua for Innovation in Community Collages, 1988) . 

_^ " Mark D. Rackaaa, -Adaptiva Testing* The Evolution of a Good Idea," 
Mucattonal Meaauramentt Issue, and Practice , vol. 8, no. 3, Fall 1989, pp. 11- 

lO a 

« J^^^^^^}^^' ,I^wr«nc« Rudnar, and Laurass Hiae, - Computerized Adaptive 
TSS^hl nSiS ^i^^^^nqj^^W gn TfT^^> MMlMr«fffnr„ and Evaluation. Dloe.t no. I97 
(Washington, d. c.i American Institutes for Research, February , 1989). ^ 

JH. ^^^i Kingsbury, -Adapting Adaptiva Teating with tha MicroCAT Tasting 
Syatem gducational Measurement i Issues and Praet^ea. vol. 9, no. 2, Suioner 1990, 
PP . J— 6 . 

ui.. *** Dianna Buhr, and Robart Wickham, "Adaptive Testing for State- 

Wide Assessment, MicroCAT New. March 1989, pp. 1,4,5. 

u.^fS"! I*w7r""°",-' -Computerized Adaptiva Testing in the Montgomery County, 
Maryland Public Schoola,- MicroCAT News. April 1987, pp. 1, 4. 

« ^'d. Applications of Item Reaponaa Theory to Practical Tastlna 

Problems. (Hillsdale, NJi Educational Testing Service, 1980). nesting 

David J. Jfsiss and G. Cage Kingsbury, -Application of Computerized Adaptive 
36l!375. Problems- Journal of Bducational Meaauremant, Volf 21, 
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Architectiml BegistntioQ Boards.^' Test questioofl are orgsnised into a series of short structured 
testlets designed to match the overall test content apecificationa and to provide equivalent measurement 
characteristics. It would be inappropriate to select individual items random^ from a pool or 
to diflOnjdty level as in a computerized adaptive test. To do so would almost ahn^ violate the rules for 
content, coverage and balance fiwnd in the test's spedficatioo. Testlets also serve to correct unezpected 
context effects; for example, if the computer selects two items from a pool, the first item mi ght give away 
the answer to the second one inadvertent^^ Aminimumnumber of testlets are drawn random^ from 
the available pool of testlets and then administered to the examinee. At thecooduaianof each testlet, a 
decision is made omoeming whether or not the examinee should be dassified as a master or noo-master 
based on performance from the combined testlets. Computerised mastery tests typical require on^ half 
of the questions administered in the cmventional paper-and^pencil format. 

8.Goin|Mter-BnndDia0KiatkTaiitini^ A particular^ intriguing application of computerized testing for 
ed u cational purposes is the computer-based diagnostietest^* A oomiNiter-based diagnostic test attempts 
to identify the spedfie conceptual, procedural, or performance errors whk^h the student makes in resp^ 
to test items or testing situations. Some diagnostic tests attempt to diagnose and classify cognitive errors 
withfai a generalized problem solving domaia These errors are often referred to as misconceptions or 
cognitive "bugs." 

4. locremeotal Improvementa to the Diqilqy of 

Com putaiMd Videos GraphieBsndAnlBaatioo Testa The increasing capability of microcomputers 
to displaar still frame videa motion videor high wMohitinn gmphi«i, «mi unfaniit^a ^n-nMrn fmi fewr ^akig^ 

more realistic and challenging tiypes of test items. These capabilities provide for assessment of mteractive 

and dynamic characteristics sfanilar to real life situations. With video and photographic disph^ capabilities 

of microcomputers, educators can develop and administer tests of sdence, social studies, art, U 

languages which are vefy realistic and life-iike. How much better would be a science test which induded 

photographs, motion segments, and high resohttion anfanated color graphic dispbgrn of science con^ 
processes? 

EamfileaofVidBf^GrapfaicsandAnimBtkmTeeta In 1979 the National Sdence Foundation funded 



°" "'^^ Innovations in Assessment," (Princeton, NJ: 

Bducational Testing Sarvice, 1990). # 

^^Howard Wainer and G. L. Kdfy, Item dusters and computerized adaptive testing: A case for 
testlets", JmiFniil nf Tf^ieational Mea mirement. 24. (1987). pp. 185-201. 

"jQ^yj^j Tatsuoka, Diagnosiug Cognithre Errors: Statistical Pattern Classification Recognition 
Approach. (Urbana, Ik University of Illinois, Computer-Baaed Education Besearch Lab, 1985). 

DavidL.McArthur,DiagnosticTestingPrqiect (Los Angeles, CA: University of CaKfomia at Los Anaeles. 
Center for the Stucfy of Evahiation, 1985). — e™, 

GarBe A Forehand and B^e W. Rice, Diagnostic Assessment m Instruction. Madiine Mediated 
LeaniinftVoL2,Na4,1988,pp.287-296. 

iMac L Bqar, Educational Diagnostic Assessment Journal of Educational Measurement, Vol 21, No. 2, 
1984, pp. 176-189. 
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a proof of concept stu^jr fir a computer-controlled videodisc addressSng coDege-levd developmeittal 
biology.^' Thii videodiae inefaided computerixed tegting componentB, Bome with motion video segments 

fLa^ UnraveHng of tha DNA lBnliieuli>\ ■fefll.ftaiiMt vMa^ Hiapb^, unfatmHii^ g«-phW, »rv\ high Fiwiihrtlnn 

gn^hie dliplqy items. Evahiatiaiis of the videodisc tests showed that students effective^ learned and 
retained the informatioa presented in motion video, video still frame, and fmhtuifiAf^ displays. 

A computerixed gnyhfcss and animatfain t^^st has been developed to test sdenee process skills of 
va ri ahl ff identiflmt i nn hypotheds fomation, oparafeinniJ diifinit}«n, airpiMPhtumH «*M}g n , w n d jntwrrfltHtion 
of data. The test demonstrated h\ i rdjabOily, and difficutty and discrimination indices whkh were 
acceptable for evaluating critericn referenced achievement^' A computerised animation test has been 
devdoped for a three-dimensional spatial rotation tasL The test inehided 80 three-dimensional rotation 
items created from eifl^ basic gnq^ figures.^^ 

POTENHALLT ISANSFOSIfATIONAL AFFUGATIONS OF CX)1IPUTEBS 

1. Current Usea cfCoavuters to AdmbiBter Stantedised Perfbrnawe T^ks 

An extensive, two-semester, fifteen-unit Physical Science course has been developed by the Texas 
Learning Technologies Group^*. It is in use in eleven other states besides Texas. In the mimmum 
configuratkm, videodisc or computer displays are presented by the teacher on a monitor at the front of the 
dass. Students wo^ hi smaUgroiqM of four or five around videodisc equqjpedcoi^^ 
some classrooms or hi a learning center. These same computers can be used for individualized tutorisls 
for indhndual students. The Texas Physkal Sdeaiea currfeuhmi indudaa n. vnriBty rfmm^i*M ¥^ ^ ^tf th«H» 
are not scored as a part of the assessment It is an interesting commentaiy on the state of the art m 
scoring standardixed performance tasks that Educational Testing Service was approached to assist TLTG 

in devdoping the assessments for th is famovathrecurrieuhmi and devetoped a set'of multiple chote 
and pencQ admhiistered tests (ETS is involved in other prqjects hxvohnng the scoring 

Becent nationwide trends in educational assessment favor the use of performance-based assessments 
ss altemathfes to the traditional multiple-choice standardized tests." Performance fantW require 
students to puhUctydispby and effective^ usf their personal knowledge and skills to write, discuss, think, 
sohre complex problems, and conduct experiments. Examples of performance tasks considered by states 



3'Bunderson, C.V., BaiUio, B., Olsen, J.R, Upson, JX, and Fisher, KM. Instructional efifectiveness 
of an intelligent videodisc hi bratogy. ^*r^^n^M?^iT^*^ T^Fnfay i^j, 1934. 



^"Bfichael E. Hale, Devek)pment of a Computer Animated Science Process Skills Test, Paper Presented 
at the Annual Meetmg oi* the National Association for Research in Science Teaching (New Orleans. LA: 
April, 1984). 

''Isaac L B^, A Psychometric Anafysis of a Three-Dimensional Spatial Task. (Princeton. NJ: 
Educational Testing Service, 1086). 

'•Borich. Gary D. "Outcome Evahiation Beport of the TLTG Physical Science Curricuhun, 1988-80." 
The University of Texas at Austin. 
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involve direct writing aaaesaments, qpm-ended mnthfrnintirw and reading items, integrated reading and 
writing ex arf Iw, and hands-on adence experimenta. 

Heaulta from a recent survey show that oeerlty half of th^ nation's states are developing, or plan to 
develofN performance taaks as n significant component of their statewide aasessments.^^^ states of 
CaUfoniia, Connecticut, Maasachusetts, New York, and Vermont are current](y impiiwnitfiring statewide 
performance assessments. The states of Alaslm, Arisooa, (>dorado, Florida, Hawaii, Een^ 
Missouri, New Jersey, North Carolina, Oregon, and Pennqplvania are current^ developing statewide 
performance measurements. Additional inforniakon on performance measm-ement qrstems can be found 
in the following references.*^ Performance tasks have recentltjr received national educational support and 
interest from a ooalitiaa of three dosen educational and dvfl rights groiqML 

nn^ mipltrpTd pnrftmnnnm tnsk *s thn dirnrt Trrftinit nnnrnnmciiit •wing atimdardiMd essay 

prompts, current^ used in the National Assessment of Educational Progress, the General Educational 
Development Testing Service, twenty-eight statewide assessments, and in the College Board Advanced 
Placement Tests. The writing performance task requires students to write a brief eeai^s) in reqxmse 
to a q)ecific writing pronq>t(s). Current con^mter measurement technology has been applied to the direct 
writing tests in the foDowing areas: 

o Banks of writfaigpronqtts, 

0 Word processors as alternatives for students to use in creating the written essays, 

o Text data bases and editors which teachers can use for storingL retrieving, and mAimging the 

student essays. 
0 Barcode readers for recordmg holistic writing scores 

Future computer technokHj^ developments are expected to provide additiond 
student writing performance including: computerized handwriting recognition systems, automated 
handwriting and text conversion aiystems and automated scoring of stud^t essays. 

Examples of PerfimanoeTBifaL Educational Testing Service has just announced a computerized 
portion of the National Teacher Examinations, the most widety used teacher licensing exam.^ In the 



*°Paiaala R. Aschbachar research dascrlbad in Robart Rothnan "New Testa Baaad 
on Parformanca Raise Quaatione" Education w— k Saptanbar 12, 1990, p. 1,10. 

if-4.4««.i^»"'*" \ Bducation and Learning to Think (Washington, D.C.: 

National Raaaarch Council, 1987). 

. ^* ^c*»*>ald and rrad M. Nawnann, Bevond standardlzad Teatinot Aaaa« «lno 

Association of Secondary School Principals, 1988). 

T J^^. « Baron, "Bluiring the Bdgas Among AasessoMnt, Curriculum r^d 
Inatruction, " Paper presented at the Bducation Coonission for the states and 
Colorado Department of Bducation Aasassmant Conference, Boulder, CO, June 1990. 

Genuine Accountability, -Statement on Genuine 
Accountability," Bducation WiHik. January 31, 1990, pp. 1, i?. "^^^ 

« Diegmuller, "E.T.S. Previews Revamped ExaminaUon for Teachers" Education Week. VoL 10, 
No. 2, September 12, 1990, pp. 1, 18. 
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computerised test, the teacher candklates «^ 

mult^pleKdioke responses. The teachers wiU also be asked to write brief computerised essays. Adaotive 
testing features are also included. 

The Natiooal Bureau of Mediaa Examiners and the National a)unca of State Boar^ 
current^ devek)pingandpik)ttestingcomputerixedadaptiveperfonnanw tests f 
ofphy s irl anB an d nurses. Hieee tests iaehide applied patiiwt management problems with muKfole correct 
answers, open ended performance items, and medical simulation exercises. 

The military and commeidal trsfarfng industries have also dBvek4)ed a w^ 
™» simulators. Many of these work-based performance tasks and «itq"lwtionB empky 
computer tecfanolo|or for devekiping; admhiistering; scoring and reporting. These computerized 
performance tasks range from the combat pmnwi ami t»n HWn«mn.i f ifj ^t imnuinton) Trfaicfa run on mnfllc 
or networked personal conmutera to thg cmnplint thw»^ Mmmnmnn^y^ fiill-flight aimulationo xyfaich ore used 
to train avline pilots and ffigbt t<irtifitr.{«na 



XOirrentUaeaorStadaiftFtadnetsia 

Itotteased national interest in performance testing has also lead to an empha^ 
portfoHomethods.** A student portfolio inchides a representative collection of tiie student's work over 
a sustafaed period of time requiring concentrated effiirt and providing a perspective o^ 

uiS*^ ^'^'^ P^*^^ For example, tiie variety and range of student 

eihfl)its in a m a thwn a ti cs portfolio migfat include; written journals, biographies of investigation, student 
conference presentations, student designs and inventions, hxvestigative reports, pbyaical or computer 
mathematical models, videotapes, and/or reflective essays. 

btersctive computer technok)gjr provides a veiy effective vehicle for assisting students in tiie 
devdopment and mnnagrnnent of creative products, exhibits, and portfolios and assisting teachers in Uie 
evahiation of these creative performances, exhibits, and portfoUc;^. 

8. Fkooeos Meaoures During Tool Use 

Uraog integrated desktop software 
It is feaaibte to collect computerized process measurements as st^ 

tiiemJabled^^ These process measures can be used to evahiate student time 

spent with various tools, sequences and patterns of tod use, mos^ 

generalized leaning and problem solving strategies. The collection, ana<y8la, and reporting oflrocess 

JTie ftiture^holds px^^ 

SSliS^i^i^SlS:^^ Current^, tools like w«S%oces«^ 

sto^atingimrtructionalvri 

P~ducto«^ The ease of changing drafts in the computer promotes experience 

and instruction on review and revision. 

4. Computer Appficatkns That Ihteipate AsaesnDieDt Wltii 

«J^S;^?^'i!?^;"'**?^ "^'^ recommending tiie need to integrate 

assessment witii instruction. In his prefoce to tiie tiUrd edition of Educational Measurement, RobatS 



pp. 2M9i ^' Asaessment,- Educationul T.a.Ho.,H4p, ^prll 1988, 
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rdteratea the historical need for btegntion of asseaament with inntniction.*' 

'In mjf view, the biggeet and mo$t important tingle ehallenge for educational measurement todt^ is 
no diffisrent fivm what it wat at Oie time the firtt edition ofOiie book appeared; that ia, to make 
meaeurement do a better job of faeiUtating homing for all indiuiduale. However, to date, 
meaeurement haa done a much better job of predicting who will achieve and of deeeribing that 
achievement Aan of helping teachere adapt instruction to enhance the kamin'j of individual 
students. The combined efforts of cognitive psychologists, measurement specialists, and educators 
will need to be devoted to this ta$k if educational measurement is going to become, not 'aprocess 
quite apart from instruction, but an integral part <^if*^ 

linn and Tjrier are ooty two of the many measurement profeaaumala who have written concerning 
the need to faitegrate asseasment with faistruction.^' Two of the primaiy methods current^ empk^ed 
for integrating assessment with instruction inchide Computer Managed Ihstniction and Integrated 
Learning l^ystrais. 

Compntw Ifanassd loitnictian. Computer managed instructioQ (CMD systema use the computer 

to record and manage much of the routine data associated with managing an entire dasaroom, school, or 
district hi which the students are workmg at differing instructiooa. rals, with different curricuhun 
materials, and with differing achievement levels. A computer managed instrustion system typicalfy 
consists of a bank of inatructiooal olgectives, a Isrge item bank, lesson pretests, curricu^ 
exercises, lesson post-testa, and a bank of inst^actiooalprescriptioas.^* Item banks are used to create 
the required pre- and post-testa. CMI tests in the past were usualty administered m paper-and-pendl 
format with a computer-readable answer sheet The answer sheets were scanned using desktop or high- 
speed scanners or the score results are entered by teachers usicg a kqrboard. When the tests are given 
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ii.H„^!fi?5lS r'*^***^i^*' Jo««Ph I. Lipaon, "Aaaasaiaent for Learning, - 

gduoational Leadarahln. vol 46, no. 7, April 1989, pp. 73-76. ^^^^'^^i 
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on learning workstatkms, the term CMI is not usualfy used CMI systems may in^hide networks among 
agwarai parsonal nnmpito ri imH mrmir^i .!5M«irtftp a^t^rintng mn«»tittMMi The mainframe- based CMI systems 
luA as Pwject PLAN. IndWdua^y Guided 

1960's and 1970*8, promised to provide individualized education through widespread use of printed 
curricuhm nvitffrinia and conqmter'SmnraiMe answer sheets. Each of these large-scale CMI s^y^ems has 
been retired Current CMI system inqtonentations typical^ employ microcomputers linked throu^ 
modems and k>cal area networks. 

Integrated Learnmg Systems OLS) were (iesigned as integrated leai^^ 
cnmHnin g ass essm e nt, instruction, and management within a sintfte system. An US typical^ itkchides the 
foMowing hardware componen t s; a central ffle eerver, masa data storage devices, local area c<Mmniit.ir«it^»nff 
network, thirty or more persocal computer workstations, and a printer. The prhnaiyfnftware components 
indude; an instnietiimal management system, cn mp rfth wisiv a computar mam^t»A hmtriiffriftn, ««ftinprf^^«y^ 
testing and ass es sm ent software, steT development activities, and histruction-related software tools. 

The QiS environment provides students with opportunitiea to take computerized achievement tests 
and receive appropriate pi ea aipU ons for mastered and non-mastaredotflectives. Thepreacr^>t;onsinchide 

and dsjectivaa. Student responses to the courseware lessons and computerized testing ir^^mm imXm are 
monitored and teachers receive reports on individual students and class performance as students proceed 
throufl^ the curriculum lessons or computerized tests at their own pace. 

USE OF COUPmESa in PBOGBAM EVALUATIOlf 

Computers play a critical nde in colleotinft analyzing and reporting data from local and national 

evahiations of maiv educational programs ae. Chapter 1 and Chapter 2 of the Blementwy and Secon 
Schod Act, Natkoal Assessment of Educatbnal Progress, National Educational Longitudinal Study, 

etc). Program evahiation is conducted primarilty to judge the worth and vahie of education^ 

Educationd programs are judged as vahiaUe if thqr lead to significant improvements 

as measured fay studtet achievement test scores and other eduLitiondhidkador variables. Datafrom 

"^dividual student and group pre-test ssores are compared with data from student and group post-test 

scores to determine if an educational program produces any significant aduevement gimg or bases. 

Computer data base managrjaent systems and statistic^ packages are ofte^ 

evaluatioa data and to conduct statistic^ analfses of the individud and aggregated group a^ 

results. Natkmdevahiations of educationd programs also use computers to conduct nationwide statisticd 

analyses of educationd program effects and outcomes. 

Computer applications for program evaluation are generally found at the large district, state and 
nationd levels. However, to the authors' knowledge, there is not a uniform nationd ^stem or 
computerized network for use of computers for program »*f«hiation. If each district had a comparable 
computer and the program evahiation data fi^ each district and aggregBte the results at each state and 
from each state to the nationd educatkmd agendas. New computer applkations should also be devdoped 
fcr assistance in district and state needs asaessment, review of proposed evduation designs and 
mstruments, and statisticd analysis, and database management of the program evduation data. 

Uae of ConqNiterB for AggregBting Infividud Data 

Computers provide ided data collection, aggregation, ana^ and reporting tools. With k)cd area 
networking and bng haul telecommumcatbns capability, data concerning certain elements of student 
achievement can be aggregated from the classroom to the schod administration ofiQce, fitun the school 
aAnhiistration office to the district admfaistration oflBce, from the dis^ 

educationd office, and from the state educationd office to the nationd Department of Education. Data 



M Walbarg, *.\d Ganava D. Haortal, The International Bncvclooedla 

of Educational Bvaluation t.^ Yorki Pergamon it m, 1990): B»cYCJ.op«^t« 
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base management Bystems can be used at each level of aggregation to store, query, retrieve, and repcxi 
results from varkMa subsets ofthe program evahiatkm data. Statistical packages can also be used at each 
Bpptopriate level of aggregation to provide statistical anafyais of educational program effects. Several 
states (Florida, Ohio) and districts (Aiusa, CA; Pharr San Juan, and Alamo, TX; and Anne Arundel 
County, BiD) are current^ implementing district-wide and statewide information networking systems which 
win fistdUtate aggregatioa and integration of student achievement data. 

Use of Diract SjjBtan lioaiimB 

The increased availalulity of networked personal con^mters and integrated learning systems i»'ovide 
the fouada t iop for devdopment of direct measures of the effectiveness, eCBdency and productivity of the 
educational system. Tha wimpntoritarf i—yiiiipg ay^tn fan rgwrd gaeh interaction the atudent has with 
the coo^Mterixed instructional and assessment system. These data are stored in a data base management 
qrstem for easy access, statistical analysis, and creation of customized or standard reports. Data from 
student interactiaas can be used to odculate the following preliminaiy list of dir^ 
performance: time on task, n^ean time to he^ lesson and response duration, lesson effectiveness 
evahtatioo, attritko, lesson avoidance, and estimated completion times. Additional direct ayatem 
perfomwnce variables should be hypothesized and investigated. 

Use (ofLopgTermKdnnrtinnal Ouloame Measm^tv 

The nation relies heetvify on standardized item tests with items whk:h can be answered quicklty, with 
one and oeafy one correct answer, and vtbkk are generally independent and unrelated to other 
test items. These are eflffritftnt and Sir less costl|y than searching for long term educational outcome 
measures. Measures of success in employment rates after schooling and in productive »ceamp\mhmmn*M 
(e.g., publications, patents) is diflOcult and costfy to obtain. Emi^iasis has therefore been placed oh short 
term variables which might improwe student scores on the multiple choice tests of scaffolding olgectives. 

Thus, we have focused on iniprovements in one type of learnhig indkator and have neglected ez^ 
of long term faI^nrovements in learning. 

1^ prtoai y long term outcomes expected from K-12 education uidude basic skm competencies in 
reading; writing; language, and mafchwnat ii rs for ftmctiwal real world and job-related con^arts; higher order 
thinking; reasoning; ereativify, and problem sohdng skills; and an faicreaaing repertoire of alternative 
leaning strategies. Long term outcomes shouki emphasise applied capabilities and performance required 
for later dassea, higiier education, and faUire workplace settings. Long term outcomes should emphasize 
the need for Ufekxig learning and education. 

To assist in providing ck)ser Ifaiks between the worlds of school and work, the U.S. Secretaiy of La^ 
has established the Secretaiy's Commission on Achieving Necessaiy Skills (SCANS Commission).*" The 
focus of Uie commisskm is to klentify essential job-related skills for effective work performance. The initial 
list faichidas twenty-eight functional job skifls in the areas of resource management, information 
management, social interaction, aystems behavior and performance, himian and technotogy interaction, and 
affective skills. It is opected that by age 16, aU students win have gained proficiencies in these workforce 
leadiness competencies. Tliese competencies provide the potential foundation for measurement of bng 
term educational outcomes. 

To flhistrate the need for focusing on long term educational outcomes, consider the results from atong 
term research prqectwitiitiie Graduate Record Advanced Chemistiy test The prqject" found a strong 
negative correlation between ezamhiees scores on the Advanced Chemistiy test and the number of 



"Richaeil Kane, Su« Barrynan, David Qoslin, and Ann Meltaer, Th« Sec retary 
AchlavinQ Wecaflaarv skli^ « (Washington, D. C.{ Pelavin Aeaociates, 



""Long-Term Validity of tiie Advanced Chemistiy Examination", ETS Research Report (1989) 
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subMqueiit reMarch papers and publicntuma produced by the ezamineee. Graduate schools should 
conaider whether they vahie eflStdent test-takiiig students mare higfa^ than ihoHf^ whn Amp!mri.> r^M^irh 
and publication. 

These data suggest the need to e:q)lore the variables v^ikh influence long term educational outcomes. 
Our typical short term perspective on educatiooal outcomes may mitigate against what we reaify want to 
measure. 

CUHBENT USES OP OOMPOIXBS IN DEVELOPlfENT, DISTSIBDTION. AND ANALVBIS OF 
EDUCATIONAL ASSESSMENTS 

Since the mid 1960*8 computer technology has been emplojred in selective areas of testing and 
aaiwMment. Test pubMien have used mainframe or minicomputers to enhance productivity in the tasks 
of te st ODOs truGtion, item banld^ Word processors are employed for 

item writing, editing, and review. Item banUng prograirj are used to store large collections of items for 
eaqr access to item displays and item characteristics. Addttiona]^, computers are used to fiuOitate test 
eoostnictioa, editing; and review. After the test is developed, laser and color printers facilitate test 
printing and fonnatting, 

Computerised took for content, joh, and taalc ana^jrsis wouM be helpfU to farther defii^ 
mora oompleai, integral-id and niotivating assessment and perf There exists a long standing 

separation between the content and processes taught in school and the content and processes required in 
the worid of wotIl To reduce this gap, editaUe versioiu of SCANS commission fimctional skills and 
HMw s smcnt scenarios could be nade available to state departments and sebjol districts- The states and 
districts could then customise these components to their own local needs and requirements. 

The benefits and limitations of computer uses m test development and reporting are presented in 

Tsbfe? 

Benefito and LmitetM^ 

Teat DefdopmBut and Bcportiqg 

Tecfanoiogf Benefits Teefanologf linntatiana 

o Word processors used for item o Lfacited number of integrated 

^riluug test construction systems 

o Item banking programs for item o Limited graphics editors 

search, selection, and o Lackof professional item 

insertion interchange formats 

0 A::itomated test construction, o Limited counter experience 

editfaig; and review of test developers 

o Item anal^ysis and caUhration 
0 Improved test printing and 

formatting 
0 Increased flexibility and 

ease of test 
o i\utomated ordering and 

distribution processes 
0 Bemote electronic registration 
0 Improved test reporting 

Computer techiwtogies have also been used effi^^ 
Using a touch telephone, computer access for remote registration and scheduling can easily be 
aowy liahed. The electronic registration information can be used to schedule the number of test 
admfaistration sessions and to electronical^ download the test from a mainframe computer location to 
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distributed testiiigcenter Iocs ions. These same procedures can be used to downlnad tPtdM frnm mwinfr iiiiio 
CQOviutera to personal CQo^puters. 

Rnn^p i iw qfCompiitcrUaeBiaTestJ^ei u|Mi^ 

Ezcdtent summaries of computer uses in test development, distribution, taid reporting are provided 
in the following references." 

Test ilna^ Beoaid Ssepiiv and Beportins 

EQgh speed test axutwer sheet scanniag machines scan and process answer sheets, score the tests, and 
store the information in a computer readable format T^iy nminft-m^ n> mtfiinnnipit fni nrft thm used 

to process and ana^ the testing iofonnatian and to prepare printed reports for the individual St 
and groqw tested. These imifaifrwnm and minicomputers are typka% located at centralized test 
devdopmentypublicatiioa, and scoring service centers. Test publishers have used compiter technologies 
to enhance their productivity in test construction, item banking; test printins^ and test processing and 
reporting. 

Computers are used in test analysis, record keeping; and reporting because they provide for 
a irtom a t lA n of time c on m m i in g and tedious human labor tasks. The computer can read, score, and store 
each ofthe item responses. Item ana^jrsis and item response theoiy statistics can be calculated easify, and 
the item and tast statistic files can be automatk»% tqpdated using on^ a few simple co^ 
copies of test scores can also be easify made. Computers provkte for a wide range of individual ^ 
reportstobeprintedfiromtheresuttingtcstscoresandprofaes. Computerized interpretative reports have 
also been prepared for an hicreasing number of educational and pe!^Qk>gical tests. 

TBBINFBASTBUCTDBEFOBEDUGATIONALlf^^ 

The term "infrastructure" is used in this paper to refer both tn tha tAnhnnlngj^Tj dgHvery Hystemand 
the human expertise. The educational feeders for sustaining the flow of expertise are also part of the 
human bfrastructure. For conventional paper and pencfl testing; the infrastructure is in place. 
Computerized educational assessment requires the introduction of a new decentralized infrastructure. It 
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roqidrefl permanent learning and awwewment centers at achoda and coUegea. The mix and configuration 
of these centera between separate lab rooma and con^niterized claaarooma has not been determined 

Three different groiqanga of profesaitHuda current^ make iq) the human infiraatructure that la driving 
jyjnofwitioo and hnplfrmentation of the precuraora of CEA. These three are: 
L Eduoaticmal measurement profesaooala, and the current teating industry. 

2. Inatructiooal technologis t a, trafaiera, and human reaource development profeaaionala. 

3. Coaqwiter techi^loglata and uaera firom mangr fielda of endeavor. 

Eadi of theae haa ita own strengtha and weaknesses. testing mdustry, whkh is both sustained and 
criticixed by measurement adentiata and related profesaiocals, wOl receive the moat attention m thia 
review. 



A Orttique cf the Edneatjonal Me a wim ii f i i t In fr wdnirhir e «id ^ dim-nt T^^iting T ^i^ mtiy 

We hove quoted the worda of leadera hi the educational measurement profeaaion and hove provided 
fiMtnotea to document that theae professionals are veiy concerned with the need to integrate aaaesament 
with inatnictiaia, but hanre not found the reaourcea to shift veiy much of their r 
into the new ^lioritiea, nor hove they introduced many new producta. 

These profeaaionala work at unhmaitiea, school aystema, at aome mduatrial corporationa, at the twc 
mqjor nonprofit centera; Educatkmal Teating Servk» of Princeton, NJ, and The Am 
Program in Iowa, and at for-profit testing companiea. They constitute the current U.S. educational 
measurement hifi»atructure f(ur reaearch and devdopment Whether thqr distribute testing ivoducts or 
not people in many or ganiiatinns form a prnfawnnrml «HmiiHi mHy «fjt^| impftrtont coDectivp urifldom for 
dealingwith the complex issues of educatUmai measurement. However, the tendency for this professional 
community to communicate to its in-group in mathematical and atatiatical terma tenda to iaolate it'^. 
An adversarial r elationship aimed at the teating companies haa developed tam a growing number of 
politicalfyactbe consumer groups. These advocacy groupa hove gained publicity by attacking different uaes 
of standardized testing without dunonstrating a grasp of the comptezity of the assessment issues, or 
proposing sound alternative measurement solutions. 

Despite ita contributioa to the sdence and technok)gy of measurement, the U.S. educational 
meaaurement infraatructure haa not kept pace with trenda and practkea in the other two profeaaional 
communitiea. A8aresult,instructkmaltechnok)gi8taaremorelike^toleadoutinthedevetopmentofnew 
q>plicatiooa that integrate assessm e nt with faistruetkm. Neither have maqy meaaurement profeasionals 
kept up with computer and infonnationtechndogy. These trends have the potential to transform tte 
of hmnanrcomputer faiteractioaa poaaible hi education aa well aa the outcomea that can poteniialfy be 

assessed. Thua, as the media for laarnfaig ia transformed 1^ advances m computing; our definition of 
literacy, education, and asHBSsment must reflect this transformation 

Oirrent educational teata are defined k relation to faiteractiona with printed matter, rather th^ 
relation to ^rnamic interacthre environmenta rich hi visual and auditoiy displ^ ne^ 
or large computerised archhrea of information. Ccirputer uaera fi«m maiqr diac^nes are devdoping 
sfan u la t iona, visuaUxationa, and hiteractiona that make new kinds of aaaessment (such aa Hdp Qyatema) 
poaaible. Tlwsefamovatora, however, are not cognisant of educationd meaaurement iaauea and practicea. 
pe need ia great to link knowledge acqdred firom interactive mdtimedia research with testing and 
instructiond development practice. 

Him IfiadiLeMdr diipWm the Testing InduatiyPk^ Profesmond testing compames have 
important atrengths, but are tiie target of mwshcritidam. They are nationd reaources of e^)ertiae m 
measurement science, and often aacend to statesmanlike leaderahip on education iasuea where thdr 
ajertiae 18 strong They promote and tiy to abide by high profesdondstandarda for quality and fiurneaa. 
Where thqr maintain significant numbers and qudity of researchers, th«y enlighten the nationd debate 



pubUcationa o£ NAEP, in general, and of the new ETS policy 
information center have a refreshing acceaaibility. i'wxicy 
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with fiMstual data. They also may pioneer new item types, new computerized 
integrate aaseesment with instnictioiL 

Various anti-testing advocacy fpmxpa attempt to protray them as vOlians, but their villainy is often 
illusory. The testa bring bad new about the edimtimuil progress minority groiqw versus white males, 
and this is taken as proof of bias in the testa. It nuQr ooi^jr indk»te fiuhve to provide the educational help 
in hoines and schools that children from each group need. 

A more vaUdcriticiBm results from coostraints these companies operate undo*. It is not a conscious 
policy on their parts, but is very real Professional testing companiee exert a strong conservative and 
inertial fowe in slowing the pace of acceptance and implemwitHtimi of computariMd taiifeing imrf imfiimnfaii. 
instructionaHy-oriented systems. Whether organized aafor-prophet or not-for-profit, these companies make 
their income from seUing paper test booklets, paper answer sheets, and in »i>»nninf^ and reporting p^per- 
and-pendl mult^ choice tests. Companies are thus very rehietant to adopt conqmterised testing; they 
are also reluctant to be the first to annmmiw a conyuterized standardized achievement test Several 
companies have been willing to invest in research and development in con^terized testing, and have 
developed some computerized testing products for item banking, localized test scoring; and q>titude 
testfaig. However, hi the absence of a proven market and deliveiy infirastructures, none of the nugor 
professumal test puUisbers have been willing to announce development or release of a computerized 
standardised achievement test 

Professional tasting comp an ies continue to rely on mwinfrATii^ and mimconq)uter technology for test 
devetopment, research and reporting tasks. In general, these organizations have not emphasized bufldiug 
expertise in microconqniter tedmologjr, or m innovative display and response technok)gie8. 

Several forces act on testing companies to keep them locked into this conservative posture. 

1. Testing companies serve cKenta. not end uaers. States, professional organizations, and membership 
organizatmns mediate between testnig conipanies and the test-taking public. As with any business, the 
client is much hi control of what types of new testing instruments are developed and depkyed. CUenta 
can lead out and when they do, the testing companies are responshre. The innovathre National Council 
ofArchitects Review Board is sponsoring RM> that is potentialbrveiy significant The College Board and 
the GBE board havcisponsorad some forward-looking research in CEA. The College Board is distributing 
computerized placement tests, and innovative microcomputer-based testing systems. 

An hicrflashig number of states, districts and Canadian prorinoes have ^.spressed interest in 
computerized testing. Testing companies will be responsive to these clients, but technology companies mi^ 
persuade these clients that they can provide a foster, k«s expensive solution. The statistical and scientific 
quality standards that the testing companies adhere to are hard to explain; hard to selL 

2t Standards for vaHdation. quality, emiatingand fiaimes s slow down innovative nroiecfa B. Professional 
testing c ompanies should be commended because of their continuing strong commitment to professional 
test developnient research and vaBdation standards and to standards of fiurness and quality. However, 
^Mesune commitments to protessional standards requiring exhaustive research and validation tend to 

repress huiovation,creathresohitions, and exploration with the use of new technologies for testu^ ItmU 
Bunpbr cost too much before it can begin to yield a return on investment 

The idea of formative research - starting witii a partial qrstem and evohdng it over time based on 
field experience, is finught witii too much risk to companies who are judged by the unchallengeable quality 
of each product as it comes out the door. 



8. Testing e 



are very 



about iftiml ff|^fl^^y>« They have had to fight 



Challenges to tiie use of tiieir products in certain high-stakes areas (employment selection is perhaps tiie 
most hot^y litigious). They have not perceived U»3t low-stakes products like help system are so 
flmdamentallty different that legal liabilities may be ndnute. 

4t IVf ting companies do not want to be accused of developing instruction for their own testa. The 
makers of a high-stakes test is in an awkward poaition if tiif-/ also develop the mstiiictional help products 
to prepare people for those tests. 
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8, Competitive pressures and project tafeasurea prevent onamizatiomi fWim rfmnyiny, The test 
dsvelcpen are not given time by their managers to tiy out new item types or deliveiy opticms. They must 
make their quotas of items, or else schedules and budgets wiU not be met The hmovative developer who 
takes time off to work (m a research project, even if f\mded from another part of the company, mi^ not 
be promoted as readil^y as those ytbo keep fijily occupied on bread and butter tasks. These tasks are to 
develop dependable, accepted and valid multiple choice tests of the biggest quality the world has ever 
known. 

This observation is not imique to testing companies. They are biiainess organizations, albeit with high 
ideals. They must meet their d&ent's schedules and produce an income to survive and thrive. 

6. The capital investment required is too high fyf ^t|ft|n *^ yf A The capital investment in new modes 
of testing means high costs for R&D that must be diverted from improving the bread-and-butter printed 
tests, hi^ coets for restructuring the c(unpany intema% and retooling people expert in pi er processing 
to become good at conqiutero, and an expeojove, missionary-type of selling to convince people to install 
hardware with the features of systems in order to run the new teats. Such investments are 
questionable, to say the least, for a supposed market that does not have an infrastructure in place. 

6. Mcagurmumfai nmfc«dnT»iln hm»> academic mistnuit nf himin^, The more academically 
oriented a testing company is (and the non-profits tend to be quite academic), the less comfortable they 
feel about embarking on new business strategies that involve higji amoimts of cqrital investment and risk. 
Hardware tot permanent testing centers, and a new technical^ literate human infrastructure is very costly. 

TheProbahleE^fshiiioQaf CunentTestiDgPitxlucta 

Figure 1 shows some possible evolutionary progressions in the delivery of answer sheet/Item test 
measurement instruments. A test requires stimulus presentations, and response entries correlated to the 
displays. Printed booklets and answer sheets dominate, but some tests use audio-visual media. T.i»f^{ng 
and language tests require audio or video tapes, or their equivalents. These group testing modes 
(individual response, group pacing) will give way to group presentation systems with either answer sheets 
or electronic response devices. Help systems (practice with feedback worksheets) can be implemented with 
paper, although computer delivery has Car more benefits, but at greater costs and loos of portabiUty. 
Notebook computer-like devices offer both the fimctionality and the portability of printed tests. 

Probable Brohitkn of Answer Sheets. Answer sheet systems should evolve and will do so. Because 
of the estabUahed infrastructure, both technological and human, and the low cost, the portability, and the 
public fiuniliarity with these systems, it is desirable that actions be taken to promote the continued 
evohition of answer sheet ayttema. It is probable that answer sheet systems will continue to evohre as 
scanners provide higher resolution and as new item types beyond multiple choice are developed to utili^ 
theaeagrstems. Test answer sheets are divided into several hundred "bubUes' -- small ovals or rectangles 
where the scanner looks for a mark. Most test developers use this type of sheet only for multiple choice, 
but it is possible to use an answer sheet for other item types. For example, by providing a grid for each 

item to a math test of 4 cdumns and 13 rows of bubbles, the rows respective^ designated by the symbols 
"."(minus), TCdivide), '.'(decimal), or a digit from 0 to 9, students could enter arithmetic expressions such 
as 9/16, 62.8, 46-7, and many others into tLin grid without selecting from 4 or 6 multiple choice 




in maaauri-ng at: tna nign and of ability. Tha arronaoua reap 
by moat students were not uaually the diatractora Invented by the 
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It ia also poesiUe to arrange paragn^ so that the worda, printed in non-acannable ink, eadi frU 
acroM several bubUes. Thai worda can be marked or lined out the student and these Bftlecttwiii ca. 
be scored''. Beoent research at ETS has resulted hi a new fira% of Tlgural Beapooae Items''*. 
TiMse items utiUse high-resohitkm optical scanners that pick up tiogr picture demente (pixels) from fines 
and vaaxkB drawn by the student The computer reccmatitutes the kicatioas and eoofiguratioaa of these 
pizda and compares them with qiedfications in a scoring kqr. The figural response itons presented to 
the student ocmtahi pictures on the answer sheet The student marks apart the picture (e.g., mark the 
nuideua fai the cdl), draws an arrow (e.g., which wa^ win the baU travd when it emerges from ^ 
tube?), or draws a fine (e.gH ^^lere could you cut the llatworm to yield the pictured cross-sectkm?). 

The art of handwriting reeognitkm ia developing; and will eventual^ mature to the point where 
printed letters and numbers can be recognized from scanned answer sheets. Figure 1 depicts these 
developments. 



Answer Sheet 9ptmm ki the naaarnnm Answer sheet systems can certamfy particq)ate in the 
tranafonnationoflearning and instructkm through mtegratingasaessnient with mu^^ Teocherscan 
uaecopyuteratodi^daybothniatnictionandaaftfiwmentin^ Students can mark 

their respooses oo answer aheeta, which could be scanned on an optical acanner attached either to the 
teacber'a computer or to a multi-purpoee computer at the back of each daaaroom. The measurement 
adivity could be qufadtfy acored and could be made a part of a tunelly grotq) diacussion. 

PoBBbie Leaderah^ from the InatructaQDal Science and Tedmokigjr Community. 

The aecond profeasional groiqung; instructional technologista and those in trahung in human 
devekipment who uae shnilar methods, hove a greater likelihood of providkig i^Mu^^rffhip m assessment 
integrated with instruction than does the testing industry. Like the measurement professionals, not mai^ 

of them haw kept pace with developmenta in technotogy and few are strong hi educational measurement 
The new breed of cognithre and hiatnictional scientists offer many eMcptkmii tn thk £r»»ni»wiK«i»iftn, *f y^,^ 

as regards technok)gy. The subgroup who have engaged technological pr-.'Uems most resohite^ are 
generaDly known in the Jbitegrated T^iwtfiwig Systems industry. 

Current Status of theirs Industry. An industry has been created for Integrated Learning Systems. 
It haa grown out of earlier work m computer managed instruction and computer assisted mstruction, 
hidudhig drill and practkwfliysteas. Hie industry is made up ofa group ofsmall specialized ILS companies 
and a group of hu^ computer manufiusturers. The small companies seU software, hardware, and services. 
They sell hardware as dealers, primari^, although one of them (Wkat 3ystems) began aa a vertkaQy 

integrated oompaqy that devdoped hardware, software, courseware, and provided servkMs. Wicat Systems 
has since offered its ediimti nn wl courseware and gervicea for delivery on commonly available PC platforms. 

The HiS industiy is different from the hidustiy for educational software that win run on stand-alone P 

Stand-akmo operatum ia a different concept from a system which hitegrates histruction, manag 

testmg through one networked configuration. Several of the ILS companies are systems mtegrators and 



developer o. 

"Wlnton Harming, QeveXoproent of cloga-alide teata of Bnallah as a aecond 

lanaafiSffii BTS Raaearch Report RR-87-iaV PrincetonV N.J: Educational Teating 
Servica, 1987. *»"w*«*j 

x..^rJ^}'?T^} x^' J. Perrie, William Kraft, Winton H. Manning, 

»» °* paper-and-pencil flo u ral raaponaaii, bTS Research Report RR- 

•9Uf 1990 • 

Martinaa, A coppariaon of multlple-choica an d conatruetgd f ioural 
y^ffP9ff#f Papar preaented at the meeting of the Americwi Educational 

Research Association. Boston, MA, April, 1990. «*^iwnax 

ERIC '1 f> 



Umd gdacmtioma AmammamKut BBotlon XXt Cnrrnt naa a naoa 36 

integrate hardware, software, and service, and install the US systems in the schools as a part of their 
overall fee. This group of small specialized US conqpanies includes Joeteos Learning Corporation, 
Computer Currkuhan Corporatioa, Wicat Systems, Plato Learning Centers, Wasatch Systems, New 
Centmy, and others. The dis tiii g iiiiihing featura of thia indugtay ia tha pnnaftwdmi rfmtht^t^] «^irrtfflihnn 
materials that operate on the integrated learning aystema and cover entire subject matters over several 
years of the K through 12 curriculum. The set of small specialized ILS componiea have found a market 
niche that is growing n^udty. Many labs and learning centers are being installed in schools. The groiq> 
of small ILS companies probably accounts for about $300,000,000 dollars per year in vohune, while the 
large computer manu&cfcurers, of ii^iich IBM, Apple, and Tandy are the most prominent, control over 
$600,000,000. IBM is the moat vertkad^ integrated and largest of these and offen courseware, software, 
and hardware under its own label Much of the courseware and software has been purchased or 
contracted from rmrll software <«n««in»ni—^ consortia, and from individuals. 

The tome cfmpiiter manufacturers install labs conairting of peraonal computara nf thftir <wwi miilr«> 
Sometimes these PC's are stand alone, but faKreasin^ they are networked into a central file server for 
use in a variety of ednrational activities using computers. This inchides instruction in cooqmter science 
and programmfaig; instnictioa in word processing; qveadsheeta, and other business pr^ 
tools, writing labs integrated with writing mstruction in a variety of classes, desktop publishing and 
grq)hics labs. 



TliePlrenowEvQiiitioaof Intei^atedLeafimgSljBten^ It was difiBcult enou{^ to learn how to 
develop faiteracthre instruction, then integrate it with CMI-like teating and with matrttMitrnml ni»iniigi»mf nf 
Now forces fai the maiketplace, and voices in the ■<s.*pi»iff<» community, are calling for a better and deeper 
kind of assessment than the conqmterized item tests provide, and for a deeper form of mtegration with 
instruction. 

Some (rfthe evolutionary threads that have led up to today's ILS systems have a continuing influence 
today. Tba Plato system was developed in the College of Engineering at the UniveFaity of Hlmnia, wtarting 
in 1969, under the direction of Dr. Donald Bitzer. Plato mtroduced some of the first multi-terminal tobs 
atleamkigcenteninsdiools. llieae were siq>ported by large mainframe computers made by Control Data 
Corporation (CDC), but have in reuent yean been replaced by networked personal computera m the 
numerous schools and collegea using this system. There was a brief merger between Plato and Wicat to 
form the Plato/Wicat Con^iany, and not too long after this merger fJEiiled, the Plato labs were acquired by 
The Roach Or ga niz at i on -who aiarketa and supports them today. Plato was chan^uoned for many years 
fay William Norris, chairman and founder of CDC, who as head of the Norris Institute todi^ is an 
statesman and leader in the movement toward transformed schools throu^ the use of integrated learning 
system technology. 

The Stanford Institute for Mathmn a ti cal Studies m the Sodal Sciences pioneered drill and practk» 
labs that original^ used noisy teletypes (this worked extremely weU m schooto for the deaf). This work 
was soundly based and thoroughfy researched, and led to the founding of Computer Curricuhim 
Corporation, an integrated learning system provider that continues as a strong pk^ today. The Stanford 
Institute also influenced the design of the IBM 1600 qrstem, which m many wi^ was the earliest 
prototype of the twenty- to thiity-terminal integrated learning system labs found tode^. This system was 
disconUnued in the earfy 197G's, but was very influential in building the human infrastructure for 
Integrated Learning Qystems.'^ 

The IBM 1600 qivtem was one of the parents of the TICCTT {Qrstem, completed in 1976 under 
National Science Foundation frmding. TICCIT offered a thirty-two terminal integrated learning system. 
The other "parent" was the technotogy of interacthre two- wiqt television over cable, devebped fay the non- 
profit MTTSE Corp. The TICCIT qrstem was designed by the team that balanced the contributtons of 
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instnictkmal sdentisW technt^ogista and engiiiecrs. It was built around a coherent instructioiial model 
and a coherent instructional nianagementmodd. Hazeltine Cknrponition acquired TICCIT from the prime 
contractor, Mitre (:kiq)aratioa, after the NSFft^^ HacettioeintumsoldittoFordAeroapace. 
It was recent^ acquired by a larger training and simul^^ The TICCIT aystem uaed testing 

intimate integrated with instructioii and with management, and had an earl^jr form of conqmterized 
mastery testing. 

Another earfy ILS compaqy besides Plato and CCC was Wicat Systems. The Wkat Integrated 
Tiea m ing System drew on some of the experience with the Stanfcmi drill and practice systems and on the 
TICXHT computer-aided instruction system. Wicat systems also benefitted from some of the earliest 
interactive videodisc work. Wkat's current business consists of two parts: training systems, eqiedaQy 
simulators for airline and other industrial training and education systems, integrated learning systems 
installed fai hundreds of schools. Wkat invested substantiaQy m computerized testing and has several 
batteries of computerized achievement tests that may be integrated with most curricula. Withhelpfrom 
foundation grants, WIcat's nonprofit institute also developed a batteiy of learner profile" tests, which 
included mazqr innovative item and test formats. 

Labs for general conqmter use in sdiools represent another evohitionary trend. These labs have 
primarilty been equqiped with stand-alone computers used for programming, cooqmter literacy, and now 
word processing, and other business productivity tools. So long as these labs consisted of stand-alone 
computers, they had little potential for comptrterized assessment or oitegration of assessment with 
instruction. However, as many of them have becume connected to file servera with network software, it 
becomes feasible to use them as ig«iiTiitig/ff»ii«N y nn<| in t centers. 

llie Possible I^itareBvahitktt of iDtepnatedLean^ Figure 2 depicts 

some possible trends in the evolution of integrated learning and assessment sjrstems. As mentioned above, 
stand-alone computers could not integrate management or measurement, but could give individual lessons 
and provide opportunities for tool use. The networked labs were oftwoldnds: integrated learning systems 
and specialized computer labs. 0£r to the side are special simulators, such as those found in driving classes 
and vocational classes, and twoKlimensional simulation software programmed on interacthre videodisc 
systems. 

Currently there is a movement to place integrated learning systems in classrooms instead of in 
separate lab rooms. There are three forces driving this: 

1) Educatonhawe difficulty taldng an entira classroom out of service (m 

but this is not their perception) so they wish to spread the terminals out among the classrooms. 

2) Educators fael that the teachers will become more involved if the computers are in the classroom. 

3) Computer manufacturers can sdl more hardware this way. For example, me thirty-terminal lab 
versus fifteen dassrooms equq>ped with four terminals each presents an obvious short-term 
business payoff. 

The thirty-tenninal lab has many advantages for computerized educational assessment because of the 
secmity, standardization, and monitoring required in his^-stakes testing sessifms. The system 
administrator in the learning center can manage testing sessions. The group presentatioa computer at the 
fitmt of the room has advantages for integrating instruction with assessment, and so does the learning lab. 
The most difiBcuh configuration is classroom PC'& Four terminsls in a classroom will g? unused most of 
the thne because the entiro group must be attentive to the group aethritiea. Teias Learabg Technology 
Group has n: J iired the use of both teachercontrolled presentation systems and four or five classroom 
PCs. The g.oap system achieves 46% utiDization while the staad^abne PC's are used under 10% of the 
time. 

Putting integrated lesming qrstem hardware and software as present^ designed mto classrooms is 
probably not a good idea. It carries with it a management model for indhridualized instruction, but 
classrooms are managed primarily for group instruction. Therefore, Figure 2 and Figure 1 both predict 
the development of group-paced assessment and instruction technologies. The teacher controls the 
computer with a file of visual materials, iucfaiding videodisc shnulations and scientific visualizations. 
Assessment questions are integrated with the presentaUons and students may respond either on answer 
sheets used for practice and feedback, or in the fUture, on small electronic devices like notebook 
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computerflL These devices have potential uses in individualized instruction, individualized testing; 
hoowwork, and tool use, as weU as group response* 

SUiaiART 

Current uses of conyuters in SBH^wmipnt have primarily been substitutive and incremental, but there 
are several transfomwtionwl possibflities> This section has considered computer-administered item tests 
and incremental improvements to them, standardized performance tasks, computers as tools to get at 
process ssnesmnflnt audfeedbaA, conyuter-enlumceddiqiJsy andi snd computers that integrste 

assessmfint with inoxuctkm and management To achieve the benefits of any of these qypttcatkms, 
whether incremental or transformational, anew infrastructure must be established in the sdiools to put 
the tools into the hands of the students and «he teachers. Discussion of the probable evolution of both 
answer sheet systems and interactive worlutations introduces us to the Aiture of CEA, vrtiere group* 

ori^nt^ ayatemg tntggrat^ with hha md with portrfJ^ cm pliy n trAnrfnmwtinniil In Section in, 

we turn to further elaboration of these future possibilities. 
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The last aectum showed current uses of compitters in assesameDt are Drimarify substitutive and 
incremental in terms of the technology diSbsioa model TransformaticQfll uses were introduced in iSectioa 
H; these include the use of compu t ers to administer standardised performance faw^ ff^ and to provide 
feedback and helps related to process during tool use. Other transformational possibilities included 
graphics, animatinn, and multi-media for deep visualization of difficult concepts, and perhaps most 
important^, the integratioa of a rswrnment with instruction and with management This section wOl 
concentrate on the transfor mat i n tml apf^irwtinn« and wfll itiaww i^tjfi^ tM>hnrin^\ dfy^lopmf nts 
underwagr and uitidpated that can lead toward these tnauformational possibilities. 

TWO FUTURE SCENARIOS FOR COIIPUTBBIZED EDUCATIONAL ASSESSBIENl^ 

Scenario 1. Three studenta are chistered around a video screen. Each holds a notebook-sized 

computer with an invisible infrarred data Knknge to a comouter that is displqying interactiiHi graphlim nnd 

video on the screen. Tbey are managhig a simulated McDonald's restaurant franchise hi their city. 

Complez management issues are presented to them, with the emphasis on financial decisions. The 

students perform some calculations on their notebook n mnp i iti^ tlwn nigmti ft»,^ dHiiiCTL Two agree, 

butooehaaadifferaitdedsiafi. The system feeds this bif ormatko back to the students, dearj^y displi^ying 

the differences in both decision process and result. The students argue for avrfiOe, then agree with the 

one student and coothiue with the sfanulation. the two saw their different errors. TbB aystem, 

meanwhile, has recorded the arhhmetk errors niade by one of the students and the critic 

that ted another to use a wrong process. The system updated the teamhig progress map of the one m the 

domam of the mathematkal concepts of fractions; and the critkalreadhig map of the other. Wheneach 

■twJ^re^rters their respecthre map a comment win be provided to each, designed 

and prescribe tutorid help with fraction and critical readhig concepts, and advfce on how to make 
judgments fai the fttture. 

Atthougfa Scenario 1 can be seen as an hitegratbn-level test of rapki judgement requiring financial 
cal ffl ilationa, the students do not regard it as a test No grading has occurred, but the experience is 
pweivedbythestudentaasanopportunitytodebugtheirthhikingskillfc They were deeply engaged and 
motivated by practking in a simulated area th^ Traditional midterm and final tests are 

!?!?^ '"'^ *^ 

eiperience, working another unencountered probtem akme hi a simulated envm)^^ 

computerized mastery test of scaffokfingtevelotjecthres. They knew they could practfce any of these tasks 

hi advance of the high stakes final ezam. 

One of the students, as it happened, was a high school junior workmg on a more extenshre written 
report on the franchise simulation to put in her portfolio, as part of the admission requirements to a 
prMtigknisbusmess school Ushig the word-processor in her notebook computer, she added a few thouahts 
to her report, gleaned fitmi this day's disnisskm with fellow students. 

Soenario2. A group of sfasth grade geograpl^ students m the midweat are typmg short responses 
* The video hivolves NASA 

■ateffite shots. Answers are entered fato their notebook computers via an faft«red Hnk to the mafai 
eomputw at the front rf^ 

representative sampte of students at 499 other schooto nationwide), win be used to evahjate 
progress in geography instruction. 

^..^f^^ .S**^ ^ chistered together bto smaU sets with a meanbgfiil 

WDtttL The setediffw hi cognitive demmid and can be placed on a learnmg progress g 
stadrats are nmthrated to perform wd^ 

results, wfflshow the daas' standhig on national and state 

wmmon mora, neattyHsted and sorted by the compu^ Those with the greatest desire to 

^^^tho^ mistakes wffl go over the missed items (stored m their notebook computera) that 
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The parents know that the studenta will be bringing aaagnmenta home recorded in their portable 
notebook-sized computer. The PTA has been working with the parents to encourage them to work with 
theae students at home. The comptiters at schuol remam on after hours. Telephone and modem linkages 
enable parents to dial up for ftirther informaticm about the student^s progress on learning progress maps 
that visual^ displiQr the curriculum for the student that year. 

Administrate know that they have to strictly avoid the use of learning progress data while hdding 
the teadiers or students aocountaUe or grading thent They are content to know that an outstanding 
accountability test wiU be given at the end of eadi blM^ and 

end of the year. Thay ar<i nnmfTrtaM^ in knowing fchitt tjw mipriptihim fai AHgiwH with and 

that the teadiers can tdl them how the progres s data ia kxiking. They have statistical madiineiy that 
aDows them to predict how well each class will be doing as a group on the assessment at the end of the 
year* Moreover, they have formative evahiation information th^^ghr^ 
might not be ddng wdl on the annual assessment. 

Curriculum devdopen use the natkmal item analysis results gathered from 600 selected schods to 
iqpdate tutorial ezerdse seta in geognqpfay that provide hints and helps for each of the common errors 
found in the analyaia of the ovor 10,000 student reflpooae vectors. These sets wfll go out to particqMiting 
schools in both printed and dectrooie form, depending on the ddivery system at a particular school 

Bnth Tmm^ nirm wmwiMtont with thm 9mi>nmTnmnAiMn(i» «n <i*>iff ffp^rt, To mOVe SUCCeSSfuIfy in this 

direction wiU require progrees in sdentifie foundations, technologkal tools, and poliqr. 

SCIENTIFIC TRENDS 

Trends in Co0Dstive cmd iDSmiraQnal Science 

The first scenario described above incorporates some guesses about where Cogcithfe and 
Sdence will poeitioa assessment and instruction hi the fiiture. Progre ss is being made fai the analysis of 
cognitive processes for use in instructional design, mdividual diagnosis, and feedbAdt, It takes years of 
deep cognitive ana^Tsis in q)ecific task Hftm«8n« (e.g^ the franchise management taak) to be sensitive 
enoufl^ to provide the diagnosl'^i of the fraction error and the critical reading error described in the 
scenario. 'Vtut use of mass response collection methods illustrated in scenario 2 mi|^t lead to seta of 

mwimrBi wrmrw that wHI pinmk » mnrm p mgmafL* if Uma iwumAm*, appiftiM* »a »V wf^Nfm f^f hnill^wg * 

responsive and reasonabty faitelligent learning environment with built-in ^fjnrmm» of the most inqxnrtant 
pitfiEdla. 

The power of %ipassea' the evideux presented in a problem situation indkating a selects 
approach does not work - is understood in cognitive science. Impasses are opportunities to learn. 
Students are challenged iotiy using a better approach. If the atjmtem contains help so powerifbl that all 
with desire can learn, at least the most common in^wssefl wiU be foOowed by hints, cross-references, or 
examples to guide thinking toward that better wagr. Current intelligent tutors that can generate the best 
help for each learner are even more Utopian than what is enviskmed here. We predict the devek)pment 
of empirical methods that can aid in diflcoveriny the important im^MWMm tpiiBlc^, t\wm li^H tn fiitm iiiih f tmf 
teadi one or two correct i4>proaches. We do not envirion a "siqper-diagiMMer" of any possible error model 
that ft confbaed but creathre mind nuy have generated. 

The power of social context and grc\q> efifects is veiy mudi on the scientific agenda for cognitive 
science today, and scenario 1 honors this by using groiq) discussions among the three stud^ts, as well as 
OQ-to-coeinteractkms. Meam->g that will transfer to new situatkos is ""negotiated" through group dudogues 
(and hiside one's own haul, as in the reflection afterward by the giri workkig on her portfoUo). Useftil 
knowledge structures are not memorized from books or conqiuter screens. 

Bespect for integratkm, transfer, and creative productkm also characterizes cognitive science todqy. 
In sceuario 2 even the sca£folding items of geograplqr instruction were integrated into testfete containmg 
more meaning and context than individual itens. In sce^iirio 1 a standardized simulation task was used 
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to teach an integration objective. The impending final exam question using an unencountered problem 
would provide a test of loumiedge transfer. The portfolio building activity described for one stutlent offered 
creative production and transfer. 

Tiwiming and Ck)gnition are situated in neX human and f^sical omtexts". Providing more 
prcject-lllce activities, and amaU working groiqw introduces more of this reality into achool settings. 

Metacog n i t ion and learning strategies are always difiScult to address. The situatioo is in^)roved hy 
introducing simulation exercises and computer tools. Learning strategy is a dull subject without a specific 
contezt, without spedfie "rules of some game". General)^ speaking, about the best one can do is some 
variant of •SQ3R'' (Stu^y, Queatioo, Read, Redtc, Review). Strategies become vital and interesting within 
the context of some 'game' like dutss, franchise management, using an outline jvocesaor or a qpreodaheet 
prcgram. Interdisciplinary expertise can be obared, and strategies and tactics for accomplishing different 
goals can be discussed within these specific domains. TVt^ itiitnimriffli g»"'»mtH fw mmtMrrptHiu» . 
• reflecting on f ^ufl^ Specific tools and tasks with a formal stnicture and a^yntaz can be used to promote 
thia Und of dif >> n s s i on and setf-reOectiaa Tliat is why the proposed emphasia on strategy oijectives, 
practiced and perluqM eventuality assessed within performance tasks and computer tool use is 
recommended. 

Cooathre and sdf-mwnngffmfnt o^ecthres are at least as inqxartant as cognitive otjectives. Conathre 
olgecthres deal with mothration, persistence, commitment lliere must be research and development ahned 
at breaking out of the cycle of dislike and avoidance for conventional testing and instructional practices. 
Students need help and confidence-building aq)erietices leading up to chaUengea that they respect and 
vahie. The two scenarios depicted the achievement of new attitudes toward finding ones own errors -then 
fixing them- New student attitudes toward self- management and self-motivated writing, reflection, and 
learning were also depksted hi both scenarios. The reasons for claiming that CEA can help are given in 
the section on achievement constructs, below. 

TVenda in Meaaurement 

In scenario 2, two recent devek^Muenta in measurement science are assumed. One is a national scale 
of learning progress. A candklate for devek>pmg such scales is the Hierarrhkad^ Overlapped Skills Test 
(HOST) devek)ped by Don Bock". It is based on dusters of items (sometimes called testlets) that 
cootahi increasing levels of cognitive demand, each one hidwfing the previous level The constructs 
underlying each level are understood by the testlet devdopers and can be ittustrated and taught A 
student's positioa on the scale, unlike a norm-referenced ranking scale, has instructional utility. The 
student knows what he/she can do, and what must be learned next Other new scales diagnostic of 
leanii^ progreus are also under devetopment***. 

The theory and practice of usmg testlets (instead of indhridual items, whkh do not carry enough 
context) is the second recent development"^ Measurement sdence has developed powerful machinery 



'•John Seety Brown, Toward a new epistomology for leamingf , in C. Prasson and J gauthiar (eds.). 
Intelligent Tutoring Systems at the CroB sroad of AI and Education. Norwooc NJ: Ablax, 1089. 



'*Sock (1989) 

"'^Yamamoto and Gitomer, 1989. 



Howard Warner & G. L. Kiety, "Item dusters and computerized adaptive testing: A case for testlets". 
Journal of Educational MeamiremenL 1987, 2i(3), pp. 196-201. 

Howard Wanier and Charles Lewis, "Toward a paydiometrics for teaUets", Journal of Educational 
Measurement, 19~ 
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for dealing with Hem tests. However, it has not yet developed equivslent maduneiy for measuring 
constructs inferred firom tbo hifl^ variable responses made during a flexible performance task, Uke a 
CQo^xiteroQQtrdled shniiliitinn, nor during tool use while producing an exhibit Item clusters nuQr use 
fiuniBar item types, yet group them into contexts that can represent levels of cognitive demand, subsets 
of content or process, aspects of procedure, aspects of strategy. Banks such dusters can be sano^ded 
using adaptive techniques to gain much vahiabte knowledge about how each student is p r ogressing. 

Devektping modds for incorporating time as a measurement variable cmistitutea one of the most 
important trends in measurement sdence. Many aspects of time may be measured, in«»im%g the tfaw* it 
takes a student to conqdete a partkular task, the time it takes to complete a vdiole set of tasks, the 
amount of time something is displagred for a student to observe. In the past, descriptive statistics on time 
intervals have not been kept because of dtflfVrmltiftw in predsety timing individtial items and con^xment 
partsofitems. Today, with compufceriMd tgatmg ami mmpitoriMw^ 'yip'ilii'Ttrntiftn, mffwiring time is fnwy 
However, little researdi has been dome in this area. The relation of time to all sorts of proficiencies m 
many fields and for many tliought processes is a subject for research that wlQ occupy thmimmds of 
graduate students and scientists in the fixture. 

BeeeanA Integrating Aaseasntml and lostraction 

SuocessfiiQy integrathig assessment with instruction has been the goal of measurement sdeniJsts for 
decades. Enormous^ diflOcuh to acffompliwh, it invohres the integration of maiqr scientific *^i«Hrlin'*ff- 
Developmental Pqrchdogy, Cognitive and Instructicnal Sdence, human and organizational disciplines like 
Organixational Behavior and Anthropology, as well as Measurement Sdence. Smce sdence and 
measurement are inseparable, it is vital that the interdisdplinaiy nature of the fidd be recognized and 
activdly promoted by today's educational measurement sdence leaderdh^. 

In the pa t six years, there has been a fluny of activity in the fidd of educattonal measurement ' 
around the integration of assessment with instruction. For example, within Educational Testing Service 
fai Prinoeton, N J. there is a recognition that this integration represents a fertile area for research. 
F^nefitting from a broad range of coosulthig expertise and collaboration with univeraities and private 
organizations, some promising new developments have emerged. A number of these are described in two 
forthcombg books'^. In additkm, A new moLogr^" commissioned by ETS proposes a research and 
devdopment agenda for integrating assessment with instruction. Through an extensive literature review 
and analtTeiSi Bichard Snow of Stanford Unhrerdty and Ellen Mandinarh of ETS seekto j^tify what can 
now be answered through research, organized around four generd questions: 

1) What constitutes learning progress toward masteiy in an instructiond domain? 

2) What constitutes diagnostic assessment of learning progress for mstructiond use? 

3) How might performance tasks tiiat provide sudi assessment be designed and evaluated? 

4) How mi^ collections of performance tasks be mapped bito an mstructional domain to guide 
instruetiond adaptation? 

The recommendatii>ns Snow and Mandinach make for Systems That Integrate Instruction and 



"Prederickson, N., Glazer, R., Lesgdd, A., and Shafto, M., Eds. (1990), Diagnoetic Monitoring of SkiM 
and Knowledge Acquisition. Hillsdde, NJ: Erlbaum 

TT«, /f**^^!?'^ ^» Bqar, L (in press) X( ggt Theory for a New Generation of Testa. 
Hillsdale, NJ: Erlbaum. 

"Snow E. E., and Mandinadi E. B., (1990, Pre-publication draft). Integrating Assessment and 
Instruction: A Research an d Development Agenda. 
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AflMflament (SIIA) are coonBtant with several of the m^or recomniendations made in this paper. In 
particular, research and development practice mtut move aa rt^dO^y as poatnble in the direction of leaming 
progress help qrsteniB. Snow and Mwndinarh ftarther point out that fai the old paradigms (e.g., CMP the 
instructiomltaskaand tests remain distinct from instruction. They are hq)elUl that the field is maturing 
to the point where faitegration of the two can be auccessfliHy addr oase d. However, they point out with 
needed caution that theories of leaniing progress and its diagnosis for teaching in an inst^ 
are not now avtulable. Their agenda for research and development is therefore a long-range one. 

Measurement science wffl be challenged as never before to stretch into previous^ uncharted research 
areas. New kinds ofgrowth scales are needed based (m sophisticated cognitive and developmental theories, 
rather than the overlty simplistic and erroneous model that curriculum is con^nised of equivalent snq)pets 
of loDunriedge. Moreover, scoring methods for complex integrative tasks, performances, and tactical 
sequeaeea m tod use win apm up naw pnadhiUdiim far <«ninpirt«wiy^ wwnfmnftnt Promising models that 
mix growth measurement with diagnosis of different strategies are now being developed. In a wi^, 
measurement science is entering an era pregnant with possibilities for the esploeioa of new models and 
methods, much like the era of LX.Thur8tone when many new methods for the testing of primary mental 
abili t iea were fdoneered. 

FDTUBB PBOGBESS IN INSTSDCnONAL SCIENCB AND MEASUREMENT SCIENCE DEPENDS 
ON IBB DEVELOPMENT OF GENEBALEZED ACHIEVEMENT CONSTBaCTS 

Educatio n al an d paydiological testa measure human aptitudes, traita, or achievements. Noneoftheae 
are visible phiysical quantities that can be observed with the five senses; therefore, before appropriate 
items or performance tasks can be deveioped, vre must "cOTstmct" a mental picture with words or images 
to define what we ara tiding to measure. These ideas are called "constructs." In aptitude testing; 
psychologists and measurement sdentists have developed such constructs as spatial ability, induction, and 
deductive reasoning. Psychological tests measure constructs of personality and of dinkal pathology. 
Admissions tests measure verbal and matheroat ical ability developed over previous learning and schooling. 
In aciucvcmentmeasiircment, there is no generally agreed-upon set of consirxKrts, yet differing 
have a great impact on research, tmrhing, and on how the public views educational gools*^. 

Aa was mentioned earlier, the predominant uses of eiucational measurement earlier in this centuiy 
were for sorting and selection based on faiteOigence or aptitude. The historical shia from sorting and 
selecthig to promoting growth in leanhig is correlated with the shift awqr from aptitude me 
toward achievement measurement It is worthwhile to understand how aptitude constructs and test 
formats influences achievement testing today. 

Contrarting Aptitude Measurement and Aduevement Met^^ 

Aptitude n^aourameot cuts across maqr oooteods. When we are measuring an aptitude or 
developed ability it is appropriate to sample fit>m many small performances with little specific context 
Decontextualized items get at abilities that are general - they are applicable to many contexts. Aptitude 
testfaig is certainty one of the most suocesafiil and widespread applications of behavioral science, at least 



**Nancy S. Cole, "Conceptions of educational achievement". Educational Researcher. ApriL 1990, Vol 
19, Na 3, pp. 2-7. 

"Coll«qa edmlaalona testa, o£ which the Scholaetic Aptitude Teat (SAT) is 
the moat widely uaod, are no longar said by their makers to maaaura aptitudaa - 
innate uid di/£icult-to-changa traits of individuala, but "davalop«l abilitiaa." 
A d«valop«d ability" ia aubjoct to schooling. It ia asaumod that th« SAT 
maaauraa verbal and mathomatical abilitiea that have b«««n gained through atudias 
and oxaroiaes in many aubjocta, aa well aa by activitiea outaide of the achool, 
takan over the twalva ye&ra of ochooling. Developed abilitiea take many monthe 
and year a to develop. ' 
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when Bucceaa ia measured by how widely it ia used in military^ industrial, and educational settings. The 
widespread use of aptitude testing has generated much criticism about its validity, equity, and 
inappropriate uses. Frcmi the perspective of thia paper, one of the most damaging impacts of aptitude 
testmg is that its very success has promoted extensive copying of its item types and methods in the quite 
different field of achievement testhig. 

Aptitade tenting methods should not be copied far teats of achia? emeuL The tendency on the part 
of teachers (taught to them in college measurement ccmrses) to copy the item types and test fomats from 
aptitude testing for use fai classroom achievement testing. Aptitude test formats cany with them as hiA 
conceptual baggage the "high-stakes' frame of refereiKe (ranking and gradmg) instead of the help service 
mentality. It also results in a d^nition of the curriculum as actuality delivered to the 5tudents that is more 
fragmented and decontextuaHxed, not integrated and deerlty relevant to valued total performance. 

Adaemnent Ifra i OTiii B nl is Ahn^ in Context. Unlike aptitude constructs, which utilize 
dectmtextuaHzed items, achievement measures are always in a subject-matter context and a value context - 
-vahied performances worth doing. We achieve eitvahiedperformc^nce tasks within the domains cfhistoiy, 
science, art, automotive maintmMnrft, etc. The public ghres "face validity" to curricuhim that 
resemble vahied tasks or products in life and work, but the prevailing the<U7 in educational circles is that 
we flhoukl not make the context too specific, else we are "training* rather thnn "educating*, tj^^'m^ 
cognithre scientists are now insisting that vahiable and persistent learning is always situated in a 
meaningful context. 

AdiievementlCeasareB are Derived frtm a Curricuhmi Plan, and from a Theories and 

values about what knowledge is lead to definitions of an achievement doitain, and a curricuhrai is a 
of what must be covered, and in what rough order, within that domain. T^ypical practice when specifying 
a curricuhmi is to begin with content outlines (topics, not performance tasks), then develop a set of 
objectives. These reflect the preacriptiona for good performance objectives found in instructional 
technok)gjr text books. Unfortunate^, these objectives tend to emphasize, too strongly simple verbal 
knowledge rather than integrated pCTformances. Later, when teachers are confronted with a fairly 
extenMve curriculum outline, usually provided by their district, they pick out the objectives they feel they 
c&n cover, snipping them, as it were, from the horizontal fidork of the 'quilt'' that represents the topical 
domain. At worst, this practkse results hi a ''snippet curricuhim'' where Etlr pieces of fectualknow^ 
are presented unconnected to one another and unrelated to the complex and nested thbUng processea 
used by those knowledgeahle in the domain. This is, of course, a carksature of the worst hi curricuhmi 
fa n plm ne nt a t ion. The curricuhmi outlhie built by th^ process, may be quite thoughtful and hitegrated (but 
usuaOy it has fhr too many topka hi it). As hnplemented by haay teachers who must read the topical 
outhnes and make selections to fit mto the thne available (and to fit what they are knowledgeable to 
teach), the best curricuhmi guide may be reduced to a "snippet currksulum" when hnplemented m a 
particular class. 

Even the best curricuhmi guule that foUows the practice of rritmg performance oljectives of a 
primarify verbal nature may produce tests and matiruction tiiat are fiar from what is desired and needed 
fay our culture of the 1990's. 

, Adnevamant Gonatnicta ara Needed hi Order to Develop Better Inatruetkn and Ifaasunment 

iDBtrumenta. Instnictionalsdentists have been gropmg toward a general^ agreed-upon set of construe 

that cut across sul^ect matters and perniit the prescription of the more promising approaches to te 

andleamhig. Taxonomieaofeducationaloyectives have been the most visible manifestation of this search 
for constructs useful m instructional development. 

^CuiTicuhma developers have been guided by two mahi taxonomies of olgectives. An obgective is a 
Pjreacriptioii for writing performance items or tasks, so tiie ol^ectives selected, and the constructs behhid 
Uiem, determme the measurement uistiuments used to assess achievement in a curriculum. One such 
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tazonomj of educatkmal otgectives was developed by Beqamin Bloom and his co-workera^". It iadudes 
such ccmstructa aa memory, appHmtion, amJjyaifl, and ayntheaiB. Another taxonomy ia based on Gagne's®^ 
model yrbkh. includes cooatructa auch as memorization, concepts (dassificatum behavior), rule using; and 
protdem solving. Instructiooal develqpers tHu> use these ta»momies are lOce^ to develop obgectives at 
hi|^ levels of pfocea sin g than those vdio use no model or taxonomy, onfy pi -escr ipli ons for writing 
bdiatvioralotjectives. Theflepre8cr4)tionsha:veanfaiq>]idtniodelofadueve^ 

modd is that knowing verbal infonnaticm 'about somethbif is the predominant olgective or kind <^ 
behavior that can and shoukl be measured. 

The problem for aU devdopers is that there is a paucity of exanqples of test item and toa^ 
can be developed and used within the constraints of the {Hint delivery system. The item types in the 
developers worksb<9 of tools are inherently doniinated by abort mstructions on a printed page, an^ 
short quickfy recorded and interpreted respmses on another printed page. Aptitude test items offer a 
strong exanqile for devdopers to foOow. 

CIBAfaJ^mdaniBnlalinMiiiBiiiiiigBiMl Teaching AdhiBfement Multimedia presentation 

and dynamic IntMaction m pahiHti^ of Cnmputm' AAnmiiibiraH Tpiifai m*** mnm flmHwipogtrffl jn grtting at 
different achievement constructs than would be seen at first gknce. They are Car more than a more 
interesting and motivatknal presentation. 

Thia piyer adopta the perapegdim that thw mmmmxrtumtini. vn,iMini^ 'yf ftfndlintizftd irgrfftrmflnPft tanks, 
student-developed exhibits, and proceaa mefcsiires during performaitce pennitcurriculimi devekyers to tf^ 
p roces sin g wed above the level of the typical print-me<jdated verbal test item. This is the crux of the 
argument for using Computerized Educational Assessment: CEA takes us beyond the print delivery 
system to seorable interactions with more realistic, contextualliied tasks. Such tasks are necessary in 
assessment of more important, higher-order constructs than can he measured and implemented using the 
print delivery system aione, 

TaUe 8 is the embodiment of thin central argument It adds another dimension to the four 
measurement methods: the dimension of generalized constructs to describe individual achievement. We 
wiU can these "achievement constructs" and note that th^ are related to curriculum otgectives at a level 
of ge n era li ty hi{^er than is found in the familiar taxonomies of educational otgecthres. 



**Bei\jamin Bloom, Taxonomy of Educational ObiectiviHi. Handbook I; Cng nitivA Hnmftin (New York. 
NY; WKey, 1966). 

"Robert Gagne, The Condition s of Learning. 4th ed. (New York, NY: Holt, Kinehart and Wnston. 
1985). 
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The four achievement constructs found in the four rows of Table 8 are defined as follows: 



1) flcHffo l rtm g knowledge is verbal knowledaw about some tc/pic (terminology, definitions, 
classifications, rules). 

20 iDtegrated perfimaaoe is the c^wbility to perform well in more conq>lex tasks requiring the 
student to jus or sqbIz the irnffhlding knowledge, akmg with conmton-sense and task-specific «wn«, 
to Bohre some problem, operate some real or simulated equipment, perform some experiment, write 
up a careAiIfy specified paper, or make a carefiilltjr specified presentation. Standardized performance 
tasks general^ take &r longer to complete than short verbal items. 

8) Graatife produdkn and tfonte of perfimaaoe cavafaiBty to a new "t^ti^Kn^ Creative 
productkm oljeetives require students to tnnffftr TOtjUTlY both scaffolding knowledge learned 
Freyiou8lt7, and previouafy demonstrated ability to integrate into a new situation. The prqjbct task 
reqdrea them to use their knowledge and skill to de^ and produce some written or mediated 
presentation or some product 

4) Starategy imp i u f qii c ML Strategies are of two kinds: learning strategies and performance 
strategies. The former applies to techniques students use when confronted with learning tasks for 
the different types of scaffolding knowledge, or learning to perform an integration task well, or 
learning to create a new kind of exhibit Performance strategies are specific to some well-defined 
task, game or tool to be used. Strategjr improvement is a construct that must be inferred from more 
proficient leanung of a particular type, and ttom more efficient choice of strategy and tactics is 
PTOcef^ of performing or producing. 

All four achievement constructa have both cognitive and conative aspects. The cognitive aspects 
inchide knowledge and profidenpy. The conative aspects include motivation, commitment, desire, and 
persistence for achieving at high levels. Cooativeoljectives are of utmost importance. We often pi^ lip 
service to them but do not try to assess them. The importance of conative ol^ectives leads to a corollaiy 
to the central argument of this paper: Instruction with integrated, unobtrusive measurement using all of 
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the four kinda of performance taskt will be more motivating and leas aueraive to atudente, offering 
promuing neut approaehea to the aehieuement ofcmwtive obu'eetivea. Consider the negative attitudes and 
acvoidance toward testiDg in particular and toward studying in general (found among maiqr of todtf/s 

youth). Can a new fam^y of halp aytmiM with jntPtrrmtm^ ffhallanging pftrfnwwitini^ tfl«Tkff w— ignmon^a 

to produce ezhibita lead to more engaged and challenged students? As another poeaUe benefit, can 
unobtrusivtt measurement (e.g.| measuring use and avoidance) help identify attitudes toward learning and 
persistence, leading to wagv to inqurove them? 

As with the adiievement critjjectives, the use of conqmters in achieving cooative otgectives is 
ftmdamental, not hnidental There are several reasons for this. First, it has been demonstrated 
repeatedly that interactive instruction, espedalfy when video discs and video are involved, is very 
engaging, interesting and motivating to most students.** Second, the miss-placed emphasis on "getting 
the right answer the first time" (a no-win situation) might change. Most students re^^ an error as a sign 

of inadequacy^ rathar than m an jirniMMtiato rfipcrtunHy fiw im/y yiffl^ mpj^| Hflf-dftvftlopment. Some 

educators hove similar defenshre attitudes. While teachers still have to create an environment to permit 
a different conception toward errors, computer use presents a different model finding a bug or error in 
a computer program is a part of the process, not a reflection on the indhnduaL To find a bug is a welcome 
thing; perhaps it means that you have now caught that last trash fish out of the deep trout pood. Perhaps 
the debugging metaphor can imia-ove attitudes towards learning fi-om errora. Third, the use of computer 

productivity toob to develop student ezhibtts brings with it direct faistruction about the n 

Once again, the students are taught actively that errors are not bad, but a part of the process. Finalty, 

good performance in complex taslo, and products that can be exhibited in a portfolio b^ 

aduHa and peers, not just the questionable validation of a grade. 

TECHNOLOGICAL TBEND8 

PwHrtinna of Tedmology Trenda 

The literature on fitture technology trends is veiy large and spans both popular magazines on 

personal computer trends, and niagsiines and journals associated with computer science and engineering. 
Among the latter of the IEEE Spectrum (Institute for Electrical and Electronics Engineers) has an annual 
update oo technology trends.*" This large and varied technical Uterature cannot be reviewed here, but 
some themes dear^ have potential significance for the field of assessment: 

0 Grow) process and team builrfiny ^if^^\^ will become more important. The increasing use of 
kiteracthre cooqniter technologies fay groiqw is much talked about; including networking; electronic 
mail systems, and the like. "Groupware" is a new buxzword. Grotqxware is defined as: 

Computer^haeed eyateme that aupport groupa of people engaged in a common taak (or goal) and 

that provide an interface to a ahared environmt.,J° 
It can refer to fiace-to-fiK» interaction that occurs at the same time m the same place, or it can refer 
to hiteractions that occur at different tunes and hi different places. 

Education is a field well positioned to take advantage of groupware. It is organized around 



**Gary Borich, The Texas Learning Technology Group Evaluation Report, Austin, Texi s, 1988. 

•••mat'snew in products andapplkaOions; lEMSESSk^ for Electrical and Electronk 

Engineers, Jan. 1991. It contains news of a 64 c^on bit memory chip that wiU be mmaas production for 
^oducts such as notebook computers in 4 years. See also TiReenth annhrersaiy summit". Bvte. McGraw 
Hill Pubhrations, September 1990. To commemorate its fifteenth annhrersaiy, ^yte magazine assembled 
c(Hnputer mdustiy leaders to make their forecasts of personal computing. 

'"EDis, C.A., and Gibbs, SJ., and Rein, GJj. Groupware: Some Issues and Experiences. 
Communirntions of the ACM. Vohmie 34, #1, Januaiy 19, 1991, pp. 89-67. 
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daaarooms; riturii Sit? traditioofl are deeply inyainedf aueh aa appearing m eitan »t a unh^Ail^wl f iw^A ^ 
and co nwiwni ip ntin g verbally and nonverbal with other students and teachers. Effective groupware 
could be built around interactive display generators and projectors, with files of m»*jMm for 
eiptaining difficult ccaiceptaf iialng rniimiitinnii, uriiintiflff vi«ui««iti"nff. g"m*^ And iiiw i mm t 
that can be disployed and discussed by the dass. Students equipped with handheld respoose units 
add yet another dhnena i nn to the potential group experience. We have developed Scenario 2 to 
incorporate this trend toward group interaction. Hie ramifications for group probluvi sohring; team 
buO^ig and gandng are virtuaOljr endless vidien technology is employed in this manrier, but as yet, 
it is an area that remains poorlt7 developed in education. 

0 Use of computer networka win heconm mttr^ wirli^wpi^^, eagy to uae. and mrrPwirii^gtY ^n*f^r1 
over long diatancea. This is promismg for record-keeping and statistkad functiona m Systems. 
Stand alone personal oompirten do not lend themselves to CEA. The ejqMmsion of networks opens 
up the possibility of access to assessments stored at numerous host computers with d^ttveiy to 
individuals in homes and workplaces. Because of the security and privacy issues associated with 
hifl^-stakea ass es sm e n t , the kinds of a^iessment made available in this maniier win most likefy be 
he^ aystems where disclosure is not an issue. 

o Multi-media information display and interaction is a hoon^jny *fPj ^d now and win continue. Within 
two decades, the world has witnessed a remarkable devetopment in gri^)^^ 
capabilities. readuticm interactive gnq)hic8, video, animation, and gnq}hical user interfaces are 
becoming fhmiliar to miffions of computer users. Visualization techniques offer a new window on the 
previous^ upseen^ \ Speech recogdtion is an hnportant goal that win become eztremelty hnportant 
for aHseHsment and instruction. Digitised video on magnetic disks is a current reality that wffl 
becoDie more available and hiexpenshe in the ftiture. A merging (rf'video and computer technologies 
is e^wcted, as wett as the mtegrati(m of computer and communications technok)gie9. 

? Pmnlj;^ nTf^mtf^ groeciatty notebook computers, wffl continue to iaO^^d^ and 

ingraMff mpggffli as discussed m connection with Figure 8, the notebook computer offers this same 
portabUhy and versatOity, and for more ftmctionality than printed answer sheets, ixotebook 
computen are som e times equipped with handwriting detectors, minlmi^itig the importance of 
keyboardmg sknis hi computerized 

o Computer tools win extend oiir dafin^^j^n ?f lit^TTYT, Widespread aocessibflity of low-cost personal 
computers and production software like word processors, graphics design tools, spread sheets, 
equatkm solvers, etc win hicreese and wffl define a new kind of Uteracy. This new hteracy wffl 
become a part of the definition of an educated citizen hi the modem world. 

ADlflNISTBATION OF FDTDBE GBA SYSTEMS. 

Trends in Piqier Testing. 

Fine gramed scanners and advances m image processhg wffl open up a new range of scannable item 
types beyond multiple choice. Several of these were discusasd m Section n, inchidhig figural response 

.r. ^ Schwartz, "Computing the anatonQr of the bram". Pixel the Maoarine of 8de ntift» 

3QaaIiiayQD. VoLl, No. l. Jan/Feb 1990, pps 20-27. In the prenrii illL of a Z ^! ^^ 
suljeck, vuuahzed neuroadence models "takes us into thr skuU" to give ue insight into the brain's man- 
making abfflty. 

Q^Qg^ P"PcaP* "VlBuanzing the collision of a star with a black h>.>le". PixeL the Maoa^n^ nf 
gcfrttfr? V|0iwiliyrf.1^<ff^, Vol 1, No. 8, Jufer/Aug 1990, pps 24-29. A short time latwfS^SSS 
takes us out mto the distant universe to "see" a rare astronomical event. 
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itemfl (miirkipft drawing lines and airowa) and Qcte-eUde (marking key words or inappropriate words in 
a sentence or paragraph). These item types are easily implemented tising inters Portable 
answer media, in particular soannahle answer sheets, are by no meana restricted to 4- or 6-attemative 
muh^ dboice items. 

The following response formats include those categories of response entry ^ribich can be acconq)lished 
on both answer sheeti and electronic media, and those only possible on electronic media. 

Bespooffe ent^f 

0 Select from a liinited set of dunces by «^**^^«*"^"g a box on aa answer sheet, or point to one of 
several chokes and dick with nise, cursor, moving field, or finger. T^ype the number or letter 
associated with the alternative selected on a k^set 

0 Mark a part of a picture or line out a word on a high-resolution scannable answer sheet (Also 
possible on a bubble answer sb t if olgecta and words are careftdty positioned over the bubbles). 
Use a mouse, joystick, or ti«dEhan to do the same things on a computer controlled display. 

o Grid-in a few digits and aymbols on an answer sheet, or type the same on a keyset 

o Print by hand on hi{^ resolution answer sheets, or print with a stylus on a notebook computer or 

sensitive pad. 

o Use a kqrboard to enter character strings. 

o Move a cursor among a set of letters and words, selecting them and dropping them to a line 
where the response string of characters is being build. Signal when completed. 

Drftwlmyyiy|.iyhftppff; 

0 Recognize line segments or directional arrows on hic^-resolution answer sheets. 

0 Heoognize line and shape infonnation drawn with mouse, lig^t pen, finger, or stylus on a sensiti^^ 
surfiMe. 

Vocal utterances: 

0 Digitize and recognize a given set of words. 
Duration of response! 

o Measure and record the lateu^ before the response begins (thinking time). 
0 Measure and record the duration of the response; its entire composition. 

Measure pressure, velodtv. and direction: 

0 Use joystick, steeriiy; wheel, or other special interface device to input force, velodty, and direction 
information into the computer, and acQust displays accordins^ ^ provide immediate visual feedback 
to the user. 

IVends in Interacthe Electraoic Devices 

An the response possibilities discussed for high-resolution answer sheets, vocal utterances and 
response times are possible on Educational Testing Stations as well as Laboratoiy Workstations. In 
addition, the display options that follow are increasingly being v^nde available. Expense is a megor factor 
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in both display and respon te optionBi but coiita have been decreaamg riq>id]|y in the memoiy requirements 
to generate the displays, ia display drivers and monitors, and in response devices. 

ThmdsinlnfofmatiaalMq^ 

Text: Multiple character fonts are available and the fUture will see a huge antyr of standard and 
customized fonts. Students can reflect their own stylistic "sigDatures" in s^^ Gnaqihical 
user interfiles assure that "What You See Is What You Get"" (WYSIWYG). 

Ulll? grflPhin? increased resohition, drawing; and animation capability will put drawing power into 
the hands of students and teachers for illustrated exhibits. Color displays will become easier to 
produce, edit, and display . 

Photograph ic ilUPiWT These are increasing eaqr to scan, digitize and manipulate for student 
exhibits and teaching materials. 

Digitized Audio will be faicreaaini^ inexpensive and available for musical backgrounds (and 
foregrounds), and voice reproduction. S|ynthesixed voice and music will be so available and 
inexpensive that it can be used increasing both by materials developers and students for their 
own exhibita. 

Video (motion): "The technologies of video and mteractive computers will merge . 

Control of time intervab in displqm will be used for maiqr instructional relevant purpoaes, e.g.: 
testing of iq>titudes for perceptual speed; pacing for practice in tasks in which speed is valued (e.g., 
taping). 

B i cbnes s and realinn in performance tasks and simulations will increase greatly in response to the 
dramaticaHy fanproving cap abi li t i es of user interfiaces. These capabilities, when coupled with the low cost 
and wide availfliaility of "desk-top multimedia publishing", wiU enable more educational groiqw to devebp 
their own materials, or at least to customise existisig '""titrifllff, Indeed, individuals "in their garages", 
having knowledge and experience in some vahiable area, (perhaps acquired as a hobbyist), will be able to 
develq) performance tasks and games with potential for instructional use. This will occur independently 
of the authors having kmraMge or esqpertise in cognithre, instructional, or measurement science. Thus 
the results win be of misDed quality. 

Notdioak computers fir eodi student 

The trend toward reduced size and cost and increased c^wbility has frequently been noted in the 
prcM, with innovations appearing almoet monthfy in computer magazines and newspeiper ada. Computer 
companies will continue to increase notebook capabilities so that the price will not have to drop too low. 
For educatk>nal applkations, modest processing capability and modest resolution monochrome displays can 
be produced relativelty inaq)ensive^. 

This win make it possible to provide students in the classroom with response devices that can be used in 
a variety of w^rs: 

1) To answer questicns Clo., K^yway system) on a printed worksheet, at the student's own pace. 
The notebook can then be taken to a file server at the front of the room for immediate scoring. 



^^DougEngelbart, BYTE, Sept., 1990. 
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2) Utfliring t ftlfff fl mfnimic a t i nn a technology or a computer network, group aflgeagment quentiong can 
be prqjflcted onto a large acreai at tha front of the room by the teacher, who can access an ardiive 
of interesting display and asswmment tasks. As students enter in short answers or ftftlwtiim to 
ttmited choke items, limited choke or figural response items, ^ 

to the whole dass. When the items are a constructed response, the teacher can highlight and pnject 
particular answers that are wortbgr of grotq) discussion. 

3) As hi Scenario 2, these group response devices can be used to collect pretest data, calibration 
data, and formative evaluation data fi^m a sample of classrooms around the country. 

4) More speculathrefy, with two-way response and a microphone, studento could receive copies of 
visual materials, stored in a notebook computer along with a digitized version of the teacher's 
presentation. Each student could then go back and review the grotq) activity mmig the hypertext 
graphkad user mterfiM»,hisert additional notes, edit existmg notes, and erase what is not tot 

6) Also more specubthrety, at the end of the day, the teacher could transmit hidhodual^^ 
assignments from a teacher's workstation to individual notebook computers. These tasks could be 
customized to the needs of each student, through mdividualiied leammg progress calculations made 
by the cnnputer. 

The evolutioQ of notebook computers for the four tasks listed is considered an important, but distant 
reality hi the evohition of CEA eystems. Groupware for education will conthiue to evolve, and will also 
hichide some of the features of special lab work statkms, hiteracthre video simulators, and even special 
sfanulators that could generate rkh and highfy effective instructional displi^ 

Special Simulatera 

Aa discussed hi Sectkm H, simulators are current^ hi heavy use m both miHtaiy and mdustrial 
setthigB. 'rhey are highltycost-effecthre when a hierarchy of sunulators is used to replace mstruct^ 
spent mastualequ^nnent or m more expensive simulators. Thus a typkalhieraicby of cost and complexity 
in pitot trahdng for advanced aircraft is: 

o At the top. H coats tB.000 n» how for tflyiny *hm 

o Next, it coats »700 per hour for time hi the four-dhnensbnal movement sunuhitor that moves 

about, tilts, accelerates, and aunultaneoustypresente a visual displi^ of the ground and sky wh^^ 
a trainee operates actual cootrola in the realistic cockpit 

0 Third tevd position trafaier costs i20Q ner hmn.^ Here tiie pikit sits m a nonmovinff mockup of 
the cockpit and learns the position and ftinction of dials and controls, y 

o I2Q pgr tWMT ie Uie cost for time spent hi tiie two-dimenskmal video<fac simulator. This is a 
personal computer controlling a cokxr display and videodisc phQrer. Many tasks at the scaffolding 
level and some hitegration tasks can be practiced hi the environment of the two-dunensional 
simulator. 

pe cost benefit caknilationa with hierarchies of sunutators is eav to compute. The more trahimg 
objectives tiiat can be accomplished m tiie tower cost shnulators vritiumt 1^ 
It IS now tiie case in tiie trammg of commercial airlhie pilots that theh- first 
passengers. Their experience hi the hierardy of shnutators has been shown tiirouA 
many trahwes to prepare them adequately for their first flight aa a oopitot. 

Note that validation of the oljeetives at each level is apcomplished through perfomuifw^ thi» 

mtegratioo level hi the next higher shnulator. That is, one shmOator is validated by time reduction m the 
next most eonqilex sfanulator. The final step is the ikctual woi^ itself. 

It is hiteresting to speculate about what tiie fiiture vaay bring for educational assessment It is 
fre<pientfy Uie case tiiat used hardware can be donated to schools by hidustry (only Uiat which is not too 
costfy to mamtam). Vocation^ schoob can be re-tooled to todude and mamtahi costly manufiicturi^ 
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equipment, robotics equqmient, etc donated fagr industiy. In a nu^jority of daasrooiiui, however, we wiU 
be fortunate to move up to the computerKxmtroUed videodisc pisyer to be used by the; teacher hi agroup 
mode, and by a few students at q)prq)riate^ scheduled times during the dqy and evening* Excellent 
shnulationsofcomplezhitegrathre tasks can be ad^^ There might be a hierarchy 

of less eaq>ensive electronic devices below the simulator ^ h prepare students for hitegration tasks 
presented on videodisc-based simulators. 

mMnW^iiy^ perftcmanee tasks 

In the diir m H ^ of trends in multimedia display and response capahOitiies described above, the 
opportunity for developing many more standardised performance tasks was discussed It should be 
mantioned in this req)ect that advances hi sofb^^ Not 
on^ win "multi-media desktop publishing systems" become available to produce more standardized 
performance tasks, but advances hi software develqment Uke otrject-oriented progranmning will he^« Pull 
performance tasks, or parts of them, could be programmed as reusable objects that could be used in 
different ways hi the production of modified or inq)roved performance ta^ 

Unfortunate^, as educational games and shnulations are developed those without adequate 
background hi mstnictional design and measurement science, there will be a plethora of interactive 
performance tasks that do not fit weU into any cunricuhmci and that violate inq)ortant principles of 
faistniction and mothmtion, without dear mstructional As Lepper and his colleagues^^ have pointed 
out: 

^Thnr dmgM hoot cfkn uiolaied 90und prineipleM of learning and motivation theoiy. Without 
clear inetruetional /inAs, th^ become discovery Seaming probUme. Many students are ui%ahle to benefit 
from the implicit instructionJ' 

Expert flystems t^^hn^q^i— offer considerable promise, for lUture scoring of compufcer-baacd 
performance tasks, and f(v computer^enerated diagnostic feedback A cdlaboration 

between measurement scientists at Edticatfonal Testing Senrice and 

and Mfa*Mg*" has led to promismg demonstrstions hi the field of coiiq;)uter programmhig. H3^ sdiodl 
students l^m^g the Pascal Programming larjguags were sssigned standardized tasks csUing on them to 
write a program* An expert system proptun called Proust scored a set of suA proptuns previoushr scored 
by humans in the Advanced Placement Teat for Conqiniter Science. Tlie expert eystema were able to 
produce scores for between 82% and 96% of the solutions, with high agreement with a human reader on 
the correctness of the s(dutkns^\ Thii team of researchers is seddng ivays to integrate this powerftil 
computerised sssessment model with iustmctkm^'. Artificial intelligence methods are used to provide 
partial«credit scores and diagnostic analyses cm each item the students work. Ckignitivefy-based 
measurement models are used to generate diagnostic statements based on commonalities in performance 



^^lA. R. Lepper, and TW. Makme, "Intrinsic Motivation and Instructional Effectiveness m Conq^uter^ 
Based Educatkm", hi ILE. Snow and M.C. Farr, (Eds.) Aptitude Lff mhig ^m^jl ]j}^rY*^^t Y^l^^ ?i 
Conathre and Effective Process Anahmia. (HObdale, NJ: 1987). pp. 266-286. 

See also E. B. Mandhiach, The Role of S^rnt^ FlftHfling ff^lf-ffeguhition in Ij^^mr 
Intellectual Computer Gain. Published doctoral dissertation, (Stanford, L ' Si^anford Unhrersity, 1984). 

^^Henry L Braim, Randtjr Effiot Bennett, Doufl^ Fiye, and Elliot Solowity, Scoring constructed 
responses u yi^^g <»^p^ srotems. Journal of Educational Measurement, Summer, 1990, VoL 27, No. 2, pp 
93-108. 

^'Ran^y Elliot Bennett, Toward hitetligent assessment: and integration of constructed response testmg. 
flllifiTWl ili ^Ulgenee. and model-based measurement Research Report BIl*90*6, Educational Testmg 
Service, May, 1990« 
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aBToastaaks. Models for diagoostkfeedbadi bock omie out of work wh^ Scoring of 

complex constructed responses to standardized tasks in mathematics is another area showing promise" 

Tools for Student Producta 

In the OTA technology diffusion r*"'^fi7", the third stage is to go beyond subetitxition and 
incremental improvement to the introduction of whole new concepts. Process measures during tool use 
aapart of a he^^rstem are perhaps the premiere example presented m this paper such an evohi^^ 

Students increaaingty use outline processing software, word processing software, and desk-top 
publishing to produce their own creative products. The classroom environment is suitable for continuous 
measurement and feedback. Thus, virtualliy all of the students' responses and response times while usmg 
software tools can be monitored for tiw purpose of providing learning progress help. In this case, 
measurement can be used to numitor the use of different tactics and to suggest sequences commands 
or activities that the students have overlooked, forgotten or not learned. Before on-line measurement is 
in place, the students om print out hiter.nediate and final products and submit them to teachers and other 
graders, inchiding fellow students, for constructive feedback. This peer review models the process th i 
profesaionaJs often use in developing their own products in the workplace. Ratings and feedback can be 
abstracted by the students and summarized in their own portfolios. 

In the emerging portfolio culture now being developed m leading schools, portfolios include not onty 
favored completed products, but also personal journal information where the students learn to audit their 
own learning processes during production of exhibits. These reflections also become part of their growing 
portfolios. 

Tvro examples ofpromising8y8temsdevek)ped for high stakes testing, that nevertheless demonstrate 
the potential for integrating assesnmmt with tool use are under development at Educational Testing 
Service. The first of these deals with the simulation test being developed with the National Council of 
Architects Review Board (NCARB), tlie second is a word processing test being developed by ETS in 
cooperation with the KBE corporation. 

In the NCARB simulation, a comi^iter-aided design-like system is provided to the candidate for an 
certificate in architecture. The system is very user-friend^ and resembles the ofiBce of a practicing 
architect A referencebookcanbesele ted from a shelf on the desktop to review statistics, construction 
standards, etc. On a separate screen, tfce architect can use some design tools to be^ laying out a house 
design according to a standardized spedacation given in the test inatructiona. Tb^reisalotof flezifajUty 
m the way any design can be approached, as kmg as it meets the overall speciiicatioiis. This falls under 
the category of a standardized perfonrance task, but it is implemented m context with the use of a 
computerized tool for developing the deJgn. 

Having developed the software for computer-aided design, ETS measurement sdenti^ 
with some success, the process of deteDxuning what to measure during the process of using these tools. 
NCARB wishes to use this design in a Mgh stakes certification examination but student architects and 
working architects could benefit from such a design tool in their own office merety as a tool as th© ETS 
measuremoit scientists determine ways for scoring excellent, acceptable and mediocre responses to a 
design specification, this information could be used to provide instructional feedback. The scoring system 
for excellence in design is referenced to process, not to norms that rank people. Therefore, the processes 
that receive higher vahie couW be suggested for architects still in training and the system couW pfwide 
practice and feedback in real-world tasks, including tasks like those to be given later on the certification 



•Mark M. Sebrechts, L LeClaire, LJ. Schooler, and Elliot Soloway, Toward generali i»H int^Hn^. 
^ ^Vr)^\ GTPR, In R. C. Ryan (Ed), Proceedings of the 7th NatSdEduSSlC^S 
Conference, Eugene, Oregon, International CouncQ on Computers in Education, 1986, pp. 237-244. 

"Marc M. Sebrechts, Randy Elliot Bennett, and Donald A. Rock, fyiy\|jp«». y w;>j ^ie constructed. 
WPPBPy y^Wltjt<>tiYff it^nw agreement between exp e rt system an d hiip i^^ ■.v.i^ ETS Research 
Report RR- -90, Oct 1, 1990. 
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daminBtloiL 

ETS is following tlie difiEbmoa research model fay first substitutiiig a comjputer for the paper and 
pencil design taaka now used in a hii^-stakes exam, aud graded hoEsticalfy by humans. Tliia piooeering 
woric neverthdess pams the way for a new q)proach to interactive hel^ aystem that uses unobtrusive 
process measures during tool use. 

TbA secood example involves a simple word processing test with the ability to identify different 
operations and tactics used by the skilled word processing professionBL The test can give feedbcttk that 
is diagnostic fay identifying spedfie processes that are and are not mastered. Since word processing 
manuals and tutorials are qirite complete for todsy's products, the test can be used for screenfaig; but 
beyond high stakes uses it win certainty be used in the field as a learning progress help system^ The test 
consists of standardised reference tasks (assigned papers) to type. Students can take the test as often 
as they Hke. Feedback and instructioa can be added to the interaction that occurs while students are 
working on the standardised tasks. The sequence of such standardised typing assignmenta could take 
students throu|^ all of tfauc. features of the word processor, providmg an integration test with built-hi 
(fiagnoatic feedbadc during the process. 

These two examples of unobtrusive measures taken during tool use are ilhistrative of a new form 
of measurement wlioify different from previous measurement approaches. Process measurement can be 
totalfy integrated with instruction, as well as with production and performance in both standardized and 
creative tasks. 

Bepresentatkn FMoesring Software. 

There is a growing fonUty of computer toob that can be used for developing student exhibits. These 
tools offer significant opporttmities for integrating assessment with instruction. It is useful to cail these 
tools representation processing software, because each of them takes some form of information 
representation de., text, graphks, or numerical representatkms) and provides processmg algorithms and 
tools to prepare, shi^ and present representations using each of these forms <^conununication. 

Text and TJngyjfrtjy frmwiUi Tools are now available and will be improved in the future. These 
include outline processors for creating and thinking about what is to be written, word processors and 
lingmstk processors. linguistks processors are the least familiar, but a few are available in the form of 
grammar checkers. It is possible to process information at a much deeper level than spell checking or 
grammar checking; fay kraUng at sentence structure and style. 

In a prc^ sponsored I7 ETS, a computatkmal linguist, Eldon lortle^* appl^ 
pro^sing software, 'WordMap,'' to essays that had been graded holisticalty hy teachers. The WordMap 
soorea were as good as a second reader in general, and could therefore be used as one "readbg" of an 
eassy. In km-stakes assessment this wouU save teacher and student time. FeOow students and teachers 
could perform the definitive readings. In other unpublished research,!^ has pioneCTed the devetopment 
of scales of linguistic maturity. Qy processing selectbns of standard writkig; a score can be generated that 
compares the student's style on a continuous scale referenced to products typical of each grade. Selectiona 
firom great authors are placed on the scale, as well as selected writing examples of hidividuals known 
within kxal schools or communities. The resulting scale has potential as a learning progress growth scale. 

. Text Search and View ProcesafaMfaoftwiirft, Large textual databases now available on magnetic or 
optical memories alkm students to formulate queries that direct the computer to search through texts 
retrieve interesting passages from several books about a topkj of mterest Th<7 learn to aasemble key 
words, perform searches, and to extract and think across many documents. They may use "view 
processing" software to assemble search material into an organised fitonework. It may be an outline 
flMmework, a hypertext framework, or the process material may be clipped and pvt Lato a word processor 



^"Lytle and Breeland, 1989. AERA paper. 
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for document integratioa.^* 

GriiPhr* Programs like MacPaint, MacDraw, PC Paint, SuperPaint, and others are now widelbr 
SEvaDable. These programs give oonqmter users the ability to develop graphic productions of great 
sop h ist k a t io n . Libraries of "Q^iArt" are available and can be used quickltjr fay eq>erieiiced users to 
assemble qua% presen t ation s . Presentation software packages are uwailable that combine both text and 
graphics in View Graph formats. These displays can be presented directly from the con^mter screen or 
printed out aud turned into handouts and transparendes. These programs allow koos and line drawings 
to be H MiT inh led, and going beyond, ibiey allow j^botographic images to be scanned in with grey scale and 
even color. More so ph isticat e d tools allow video p ro c es shi g. These production tools bring with them good 
inherent motivatiba Studento are able to produce oihibits that incorporate ezdtemait, interest, humor 
and sdf-eqvession. Moreover, students sre challenged by the hi|^ professicmal media they see around 
them. G raphi cal information processing gyatems may be instrumented with unobtrusive measures and with 
leamfaig progress feedback. 

WwihTff fPd Formulas. Schools now use data bases and data sets for e]q)eriments a variety of 
fields. Toob indude spreadsheets for numerical catailations, and software tools that allow mathematical 
transformations, pioofh of systems of equations, etc. For developing muftimiifiinii models, special 
softn'are tools Uke Stdla*° are used to develop general systems models with underlaying twAth«wTMif^| 
modeling components. 

P^'oPo^^^^'Tfll Kn?*lfflg?i Systems that allow verbal and logical statements to be processed using 
theorem d!«ckers and theorem provers are not wide^avaM)le in schools, but win be in the future These 
systems win be the descendants of current expert system sheila. Some college classes use these tools 
currently. 

Process measures during tool use ofifer the best CEAaltematwe for assessing strategies. Strategies 
are defined within the context of optional wqys to use these tools, and also witUn seta of tasks having a 
common goal structure. The use of the tools explicate the strategy and tactics in a wqy that is accessible 
to measurement and feedback. Needless to say, a great amount of research and development is needed 
before promising new kinds of process measures during tool use will ei\joy widespread use in sd'oola. 

CCA systems utilize cwnputera in the processes of development, distribution and analysis, as weU as 
administration. Technology trends in these areas win be discussed in the fonowing section. 

TBEND8 IN DEVELOPMENT, DISnOBimON, AND ANALYSIS PBOGBSSBS FOB FUTUBE GEAa 

Ufling Compuiera in the Devdoixnent of Mean«reoient IiHtnmienta 

Current mark sense testing has a veiy wett elaborated development process with supporting 
technok)gies. Future CEA systems wiU require an infiiastructure with new roles and skills for personnel, 
as wen aa new technok)gical tools. A selected few are described bebw: 

o M AiwIyBWi New methods ofjob and task ana^ win emerge. Many newjobeofthefUture win 
invohre technological tools in someway. Aqy of these technological tools can be augmented to coUect 
process data from both expert and developing job incumbents. Our prediction is that computerized 
methods of cottecting data fitun technok)gy-using job incumbents wOl make it easier to determine the 
types of tasks engaged m fay them, and the different standards for performing these tasks. In 
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addition to online data collection, Job analtysta/obeerma will be able to capture information about 
technical and non-technical tasks in video^ audio and aiymbolic fonns via notebook computers in real 
time, at the location where work is performed. 

0 TWwwinpijy f^ni TflffHf I Item banks are now In use, but these are relativeltjr primithw. 
Although some use database techndogy, they have generalibr not integrated items faOy with 
multi-media editing capabiliti eB . Tliey hove made the text eaay to edit, but not the graphks. Item 
bank development aystems are current^ designed to produce printed tests, not multimedia 
interactive tests. At this writing; no test devetopment group has mtegrated graphics, anfanations, 
audio, and video into their production systems. Hie trend toward multimedia will change the nature 
of item banks, akmg with the associated item editors, and types of tasks selected from these HaiiW 
will improve and become flilly integrated across media. 

The trend toward performan^ tasks and exhibits Willi /ohitionize the meaning ofthe term "item 
bank". Many of the tasks stored in these banks will not be items at all, but standardised 
performance situadons. Some of these will consist of rather complez simulations with associated 
scoring protocols an d keys. Thus, the m <? item ba<oks wm become sophisticated computer programs 
with associated multi-media files - simulation tasks written as otjects, or composed of otgects using 
olject-oriented programmhig techniques. There is potential for using these banks in multq)le 
situations so long as the new use can be accompanied by the development of a new scoring and 
feedback protocol 

o Data Collection for Preteatinfc Norm Devetopment and Item Qdibration- One of the moat 
signi&: nt breakthroughs in CEa could come through the devetopment of a distributed network of 
temporaiy data coltoction sites where response data, (inchidhig pre-response data) couM be coUe^ 
rapid^ and transmitted elect^i.Vadtyovaiiight to permanent data center This concept is depicted 
in Scenario 2, and its significance for educational assessment cannot be overstated. Achievement test 
norma now used in schools are rarefy updated more often than once every seven years. As the 
population changes, Uiese norms become less and less appropriate as interpretive frameworks for 
assessment With on-line sampling for norm data, it could be done annuaBjy. 

It is a veiy costfy process to prepare pretest materials for schools willing to jf^inction as da^ 
sites, and to administer tiiese instruments wiUi Uie necessary quality control Otiier costs are 
transmitting tfie data back, scoring; and processing. The idea expressed m Scenario 2 enebles a sample 
of schools to provide responses to new sets of items and tasks, providing an exciting opportunity to 
prtidpate in an important national prqject, yet also get feedback useftd to tiieir own tocal programs. 
National assessments like tiie NAEP survey couM take place much more quickty once tiie infrastructure 
of test schools (and tiie means for shifting and a<||usting the sample on an a regular basis) is devetoped. 
In addition, state and tocal norms couU be developed tiuat more accurate^ reflect the dynamic and 
chang ing demogrqihics of conmiunities. 

Deeeotnifiied DevdopmemL 

One of tiie most remarkabto prospects for materials devetopment wiU be the prospects for 
decentraBringtiussecomplei and diflOcuItdevdopment processes. We have already witnessed this rapid 
deoimtralkation in tiie evohition nf microcomputer-based desktop publishing systems. As "desktop 
multi-media production i^stems" become toss espeotme and more widely available, Uiis phimnmynffn will 
fiirtiier extend tiiese capabiUtiea. The net effect is tiiat schools, academic research fiMdlities, and 
commordal devdopment and consumer groups can create tiie interdisdplinaiy teams necessary for 
&mo|mtive and high quality materials devetopment Editahto curriculum a^ 
devetoped and adapted to the unique needs of communities. 

A vital link to making these local adaptations progressive and evdutionaiy in tiiew 
the w^spread implementation of formative evaluation software in schools where computer-aided, 
interactive teaching and assessment materials are used. 

ERIC 



Camputnlmmd Bduamtiimml ABammamBnt... Station Illt Fntur^ Uama pagm 57 

New PoarhffitiBa far FormatirB Errfugtion and Imn K wmn eu^ With local file servers and networking 
to an of the int. ttttive leaming^testiog statums within a school, response records on all students can be 
kept for assessment ^jnirposee. Some process measures inchide reqwnse times for items and conqiletion 
times for lessoiis and modules. Software can summarize this data dtynamicalty so that it is available quk^lt^ 
to he^ determine yibak is or is not working. 

Ideallly, there win also be incentives to encourage schools and districts to send their d^ 
devdopers of the aiystema so that those devek>per8 win be able to obtam suflSdent response data to 
e£fectivety revise the expcoAn and amiplex systems for ^ea^y improved ftiture editions. The data v^uch 
ifl actual^ used should come from carefb% sampled and representath^ Since there win be an 

increasing demand for locnHxntioo of at least some of the modulea in a gyatem, it will lw» impftytflnt for local 
users to hove access to formative evahiation data. Bkhard Snow and EUen Mandinach"^ have made this 
point wen in their inqKvtant dowrniwrit on integrating — m ont and instruction. 

'Formative eualuation, and adaptation to local drcumstancea, U the sien quo nan ofnsearvh and 
development on tyatema that integrate inttruction and aesetament (SUA). We do not imagine SUA at 
being 'deaigned* and then 'implemented,' as thoee temu are ^i'"Jly used Rather, we aq)eet that each 
euch ^stem will evolve in ita time, place and domain a$ a function of continuous monitoring and 
Unhering, even though each may ttart from the rough common scheme described in this report.. 

It follows that SUA design should contain provisions for continuous monitoring and eualuation of 
its own functioning in each usage.**^ 

Trends in Distributka for CEAs. 

Distance leaminft facfl i tate d by telecommunications technology, represents a promisiug trend that 
can influence the distribution of CEAs. As mentioned in Section 2, it has alrea4y been used for distance 
registration and for testing at permanent testing centers. Moving beyond the distribution of assessments 
to fixed as well as temponuy locations, telecommunicatitms will be used to make interactive he^ systema 
available for many sutgect areas to people in schools and in nontraditional learning environments. Help 
qrstems do not require data security consid er ations inherent in ffigh Stakes assessment, thus providing 
rich opportunities for innovative, interactive multi-media applications. 

CEA and help systems ca^ "teo be integrated into retraining programs for teachers, district 
a^kninistrators, etc CEA, and hcdip qrstems appBcatkms can benefit firom distance learning as a means of 
in-service trambg. Broadcasts featuring outstandfaig teadiers could be foUowed up locality with practice 
using bunt-ki professional development materials m the help eystems. 

A combinatianofmagnetk; or portable digital media like magnetic disks, optical disks or t 
additional deliveiy ahemathres to mafling printed tests or to using telecommunicatkms as a means of 
distributing tests to remote sites. Portable media is also an alternate means of sending response dat€ back 
to central sites. Portable media \a combination with express man and overnight delhrery options may 
sometimes o£fer a more cost effective assessment distribution alternative. 

IVends In Analb« and Beoord-keepin^ 

Among the problems encountered in implementing computerized instruction and assessment systems 
is theovarwhehnkig amount of data that can be recorded about each student response. Coherent models 
for recording; prioritizing; organizing; retrieving and analyzing aU the new kinds of time and response data 
have not yet been developed Advances in measurement science are dosety tied to advances in the 
technologies and analysis methods associated with record-keeping. 



"^now, E. E., and Mandinach, E. B., Integrating Aam wament and Instruction: A Tipmi-^roh ahH 
Development Agenda. 1989, Pre-PubMcation Draft. 

"Kichard E. Snow and Ellen R Mandinach, Integrating Assessment and Instruction, a Beaearch and 
Devetopment Agenda. Research report, Educational Testing Service (and press). 
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As mebmirement adence begins to develop models utilizkig time as a key variable, analyais and 
record-keepmg systems win be revised to deal with time data. The time variable has the prospect of 
beammig one of the moat imp^ 

^sterns and aasraated instruction. The tfane and the utilization of le«soos and smaller moduteHfiU^ 
Uiem can be obtained from on-line systems. The inteipretation of time statistics becomes very import ant 
Some modules win stand out as being extremely tfa^^ 

todesiraU^andpotentialfyconAiaing. The combination of time, errors, use, rmd avoidance of certain 
featm-esbecome data for formative evahiators of the ftiture who are researchirjg f^ecthre methods and 
how to improve them. When learner dioice is invoked, avoidance becomes a potential uaeftd varied 
rOTttoassesamratrfconativeol^ectives. Wl^ were certam carefalfy craftrxi and important modules 
attended to brieiQr then later avoided? 

The trMd toward portabflity can affect ana^j«isimdii!cord-kecpi^ In the earhr 19808. two 

propoaed the use of a credits sized data storage cud for an mmtaiy personnel 
Imprinted wfeh registration information, the card would be capable of confarining .ifl fafimnuf gath^ 
^wrt teaning assipiments, and achievements gained throughout each recruit's career. 

Severaltecva^^ One technofc)gy places a microchfai into the 

caidvHikiicontamsm^^ The femiliar magnetfc stripawdit caM 

or teto machine cMd does not store enough information to mate 

part of thfa apphcation) is now a rea% 
«s^Bh» Cross/ Bhie Shield uses siKdi cards for U^^ 

^ '^y*^ win mean that the work roles of the materials 
d^^owtteadnu^^ CEAi,yst«ms,^eS% 
^afafa^ mtmuite^y with ins^^ Th^Zb^ean 

Z^IS^ ^"^ir^. fe<»ecd, improved outcome ass^ent is essentia 

to many caUs for substantial educational restructuring.*' ««»«ium 

com^tt^ as produce of their own products, as monitors of their own growth ^JgZ^Z^ 

^r^^^'^^'^^haw^ Teachers Mfin a mwiSIS^ 

^d^-n^roleasassessorjm^ 

PJJ^O^^^^tetion, much as other profe«riomils who have previousbr ^ed out of aJS^^ 

Among these maiv role changes, the role of the teacher is the most centrd and the moat a 
How GBA Styatema Wmimpact the Bote of TeadMT 

h^^Sr^* substantial technotogy interventions into Uie classroom have fenen short of promised 
bra^mitilteacherscouldaccep^ Managing a comrentionaldaZLgrSSXi 

teachers to a set of mental models about what is imDortant^^^ 

hthewvtocawthtocurrtailim Attaitfcm to <««i iirffaMud to M^Mth* p™^^ 



"^Kems, Thomas, and Bowser, John books. 
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Both the ihifk in role pao?pt^ Theformer 
aeema to take fttmi 8 to 5 yeara with an Integrate 

GBA, eq>ecia% help qratema that integrate asseaament and instruction uaing coaqrater technology, 
providea even greater demand for both role ahito In this preaaure to learn new 

waya, tewtoa and other educaton wiD be ezperiencfa^ 

joba that demanda that they preparo their students for such changes* But many teachers themsehres are 
recratprodir^aof a8n^n)etcurrkuhm^ Not aU have the thinkfaig skiUa sodety wanta them to faici^^ 
fai the young. Not all of them hove the confidence or wiDingneaa to learn new computer akiUap 
measuremwat know-how and a as eBsment ao ph istica t io n » ability to use repreaen t ation-pro res ai n g software 
toola, or barcode readers, ot portable cooqmter scoring systems for holistic :»ring. These future 
technologies move into the inner sancttmictf the classroom At least ILSoystems remain in a room down 
the haU, and have a trained systems administrator on duty to take care of the hardware and software 
conqdezitiee. 

So if rote diangea for ILS systems take 3-6 years, how bng win it ta^ 
sophisticated member of the IL3 fiBunHy? 

No one knowa yet, but the proq>ect is not as bleak as it ^>peara to be. Not all teachera hove to 
adopt ayatema like thoeedeacribed in thia section at once. The earliy-adoptor teachers are alrea4y eager 
and willing to tiy it Th^f have abeaifythous^t of their own aolutions to mai^ 
measurement, and management problems CEA aystems are designed to solve. America will learn from 
theae earlly adopters, e8pecia% if we back them up vnth formative evaluation and look for slow, progressive 
evolutionary improvement, not revoh't^i^uL 

Pottgr issues are found on both the educational side and the technology aide. Educational polidea 
have been discussed Technology policies and standards in telecommunications, computer use, and 
software can affect the ability of researchers, developers, and practitionera to use these tedmologiea isk the 
manner proposed. 

Banien to Wide^vead Implenumta^ 

The largest barrier to the introduction of CEA systems using interactive conqmtera is the 
inylementation of the infrastructure. The hardware must be kistalled i schools, and the professionala 
must be trained to adapt to new technologiral aspects of their jdba, Th^ must be he^>ed to eiqpand and 
define their own roles in a progression that will taJte several years, and vyill find them performing different 
activitiea and different roles than at present Besides the infirastructure barrier, several other barriers 
current^ exist 

Hardware CwnpatihiBly Barriara> 

There are tvro miyor computer hardware standards which exist in schools. These are represented 
by Apple and IBM microcomputers and their compatibles. Onty recently have hardware and software 
solutions been provided ndiich allowed Apple programs to run cm IBM hardware and vice versa for IBM 
programs to run on Apple hardware. An additional barrier is the ttmited processing speed, text, and 
gnqphica capabilities of many of the current mkrrocoix^ten inqdemented in schools. Many of the school 
colI^mten have ei^^t fait computers, while the stateK^^ Text 
resolution on mapy of thecomputera in schools is limited to 40 characters per line and 24 Hnes p«r page. 
This limitation is very restrictive for reading comprehenaion itema using expended text passages or for 
probtem solving items. The griq^hica resolution of maiqr of the microconq[>u^ 
pixels per screen. This gr^)hics limitation is very restricthre for jM^esenting realistk line and shaded 
gnqxhica. A fturther barrier is the hicoiiq)atibitttybet^^ Moat significant 

for CEA, record keephig requbres networked PC's. Stand-alones will have limited use hi CEA 

Hardware Awjsfaifity Barriera. 

Ahhoufi^ macy states and districts are implementing computer technology as rq)idly as is finandaQy 
feasible, there are stitt a ku|p number <^ states, addicts, a^ 

ofcompitera for use in computerized testing. Industry standards typk»d|y recommend that schools ha^ 
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a minhninn of thirty-two computer workstations for a school of 600 students and sixty-four con^niter 
workstations for a school of 1,000 students. This standard may be too low. The hardware availability 
barriers are ftirther compounded for disadvantaged schools and districts unless they use the Federal 
Chapter 1 and Chi^Jter 2 monies to purchase computer hardware for curriculum and computerized testing 
needs. 

Software A^aihKlft^ Duiieis. 

There are onty a small number of companies that currently provide professional computerized testing 
software and computerized tests for schools. Thus, even if schools have th^ hardware available, it is 
unlike^ that thqr win ham coniputerized testing software or computerized tests wfax^ 
on their hardware. There is a need for the development of professional standards for import and eqport 
of computerized testing software, item banks, and computerized tests. 

As discussed above, the integrated ift*>ni«g aystem maricet is i4)proaching one billion doUars, and has 
provided a foundation for the evohition to integrated learning and assessment systems. In this 
environment, it is probrlile that computerized testing wiU make significant inroads by offering 
computerized performance simulations for higher order thinking skills, tool software that assists in the 
devetopmfc^t and evahiation of student products, and helps for teachers and students to integrate 
instruction and assessment 

Eap^iieute and Commimii'aUoo Batiieis. 

Many of the nation's teachers and students have never taken a conqmter-administered measurement 
instrument or used a computerized test devektpment and administration qrstem. Teachers and older 
students are rehictant to embrace new educational technologies imtfl thqr have had sufficient personal 
eiqperience with the technokgy. In general, the younger the student, the less the fear of tiying it 

There isa very small groiq) of educational measuronent professionals who are conducting research, 
devebping computerized testing products, and preparing research and dissemination papers on 
conqniterized testing. The ERIC Gearinj^touse on Testa, Measurement and Evaluation at the American 
In8tHut<!8 for Research collects bibliogn^lqr references and abstracts of research on computwized testing 
and computerized adaptive testing. Hie Buros Institute of Mental Measurements, at the University of 
Nebraska, has also a computerized testing review and information ezchange system. StOl, there are veiy 
few educatkmal measurement professkmals, and even fewer educational professionals (siqiwintendents, 
princq»ls,currk»ihmi personnel, and teachers) who are acquainted with the research base computerized 
testing demonstration systemb, and sources for seeking further information. 



SUMBCAEY' 

In section m. more evidence has been presented to support the pnlky pp<» nmin«mH«fin nii mwriA fn thf> 
last sectk>n. In particular, new technologies are making possible much more variety and potential cost- 
effectiveness in the administration of performance tasks and student exhibits. Group technologies and 
portable qrstems win offer even more options. HoUstic process measures, judged by teachers and students, 
are now possible usmg the intermediate products from representation processing software tools. In the 

future, direct process measures win be possible during student engagenient with computeri^ 
tasks and during tool use. 

Such process measures can be used to mtegrate assessment with instruction in the unobtrusive and 
contfaiuous fashion rec om m en d e d in this paper. These devek)pmente in technok)gy present an opportunity 
to move in the directions of new assessment practices capable of assessmg higher level educational 
objectives. Computer technology provides the new variable m the decision logic of how to achieve the 
educational goals our country needs. 

It is true that eztenshre H&O, and ezpensWe and long-term professional devetopment of educators 
IS needed, but the current scene alreadty includes both performance tasks and tool use for student prqjecto. 
The recommendation is not to watt years for the R&D to be completed, but to encourage maay prqjecta 
and put each of them into a Formative Evaluation improvement loop. Progressive and evolutionary 
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tnmaformation of America's educationaji agnrtems can only come about throu^ slow processes that build 
from where we are now. Progressive transformation can onty be aocomplist^ fay the professionals who 
win use the new formative evaluation and individual assessment tools. 
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SECaOJH IV: FINDINGS AND BBGOMMENDATIONS 
WWDTNCift OTRHENT UHES OF C0MPDTEH8 DJ AfiCTjffauBKPp 

The findiiigB an mmmiaiijKd in the Ezecuthre Sum^ 
of technology <tiflftmion' substitutive, incrsmoital, and transformationaL Another perspective on current 
uses and trends in GEA is ghrm fay the foUowing key findings: 

]i) PurpoM of Aasenient: 

EBj^stakea aaBeami«»nt dominat^w in both individual assessment and program evaluation. Today's 
testing industiy eidsta prhL.«i^ to justify high-stakes dedsia^ Decision-makers want to assure their 
clien t ele that their decisininH are vaMd and fidr, and are based on a depqidable scientific standard. Thus 
they are willing to pay for testing of uuiividuals (or require individuals to pay C Theywill 
sometimes pay for summative evaluations, even though program evaluation takes a secondary position to 
individual assessment 

2) OtgoUives of Aflseaament: 

Deq)ite the fisct that educators universalltjr desbre the equivalent mtegration, transfer, and creative 
production otgeetives, scafifolding objectives Anmin^tM assessment practice. Essqys and other student 
prqjects are rnnphawTed wliere more than 1^ service is paid to this desbre. Another positive trend is to 
use more extended problems in math and science, much longer and more ohttnangmg timn the short 
standardixed items, but class schedules interfere if problem engagement takes more tiwn forty mwn^tr*»«y, 

Th» hitroductitm of computerized educational assessment was found to offer fimdamental hope for 
chang i n g these finding. In Section m, it was shown how curriculum developers lack a coherent set of 
achievement constructs and ways of measuring them. In writing their otgectives for a curricuhun, 
developons have depended on verbal objectives that lead to conventional test items. These do not take long 
to administer, do not require eqiennve and time-«onsuming ludistic assessment, and are fAmfluir and 
accepted. The {Hint delivery system does not ofifer powerful and feasible ways to achieve objectives more 
complex than scaffolding ol^jectivea. Networiced craqMiter woikstotkm, however, can fa»pbmiynt and 
disseminate standardized performance tasks, opportunities for student exists and portfolio management. 
Thqr can enhance the practice of disnmaing and even measuring process during tool use. These 
nieasurement methods offer the possibility of getting at the higher otgecthres 

oeative production, and strategy inqvovement so domg;th^ also offer the possibility of achieving the 
conathre otijectives interest, motWation, and persistent effort 

3) M casu i cm entPBscticealbr Indfaiduab; 

o Standardized tests baaed on short items dominate. However, hoUstic scoring of student products 
is gaining momentum, despite high costs. Raters must be trained, and common standards must 
be set for multqile raters. 

0 Integrative performance tasks have taken a strong hdd in Militaiy and Industrial training and 
assessment GEA systems offer an attractive way to extend these benefits to schools. 

0 Tools to aid in producing student products offer new opportunities both to increase this form of 
learning activity and to assess process through unobtrusive measures taken during tool use. 
Assessment of strategies is pondUe with tliis new measurement practice. 

^ Prapsm EvahiaiioD: 

Summative evaluation predominates over formative, but it is recommended that this bakmce shift 
the other way. Formative Evahiation is typksaHy less fiwored, perhaps because it is expensive and labor 
intensive if done as a separate activity, and reqidres a continuing expenditure of devekipment fimds to act 
on the results and make the improvements indkated. Learning progress assessment is used in some 
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daasrooms, but ia neither fonnaliMd as t, process nor conqniterized to any extent. Most evaluations use 
individual measures of achievementy summarizing and aggregating them at 0chor\ diatrictSy state^ and 
natinnal tofvebt It is recommended that dfarect measures of fljystem performance be utilized more ftiQy • 
This ia possible with integrated leaming/assessment .systems and with groiqhbesed p^MN^m^t systems. 
In addition, long-term measures beyond {mmft#ii<^t^ post-tests should be inv estigated; the results are often 
aurpriamg* 

(9 GBA Abninistnikm iUtenuitm 

o Scannable answer sheets dominate, with all the back-cq) technologiefl these un^. Because of 
the inherent strengths and probaUe continuing viability of answer sheet testing; it is vital to 
extend assessment options by usmg a wider variety of item types, now possible using bubble 
answer sheets, but imderutilized Many more item types are possible using higher resolution 
scanners* 

0 Student exhibits are receiviDg increased mnphimM, both as standardized and unstandardized 
asks. The expertise of students in assesament of their own products and those of others 
increases when portfolio methods are used. 

0 Educational workstations and special workstations in labs are becoming more widely available. 
When connected to networics, these open up new possibilities for powerful forms of CEA all four- 
of the measurement methods. 

o Portable response units and notebook conq)utera ofifer the opportunity to put the advantages 
of educational workstations into the classroom. 

o Computer graphics integrated with video and with active response offers new possibilities for 
visualizing di£Bcult concepts. Computer graphics based on scientific models has ntade it possible 
to visualize for scientists, and in the future for students, phenomena from the vety *™»n to the 
very large. 

(P The Bole of the Testing Induafay m Inteidiidng Cnmpitorht^ TM.irirfjn,M| Am,mm.«^ 

It was found that, for a number of reasons, testing companies are not Ukelt^ to be the leaders in 
introducing comppiterized assessment or instruction integrated with assessment Quite aside from the 
constraints operating on the testing companies, the installation of hardware in schools iB be^ 
instruction, with assessment as an afterthought Thus it is more Ukety that companies and state 
orgsnizations concerned witii faistniction and witii strong sawy in the key technok)gie8 will provide the 
leadership. These companies have people who have learned the lessons from Computer-managed and 
Computer-Assisted Instruction. They have instructional devdopera on their stafib who know how to 
develop excellent interactive instruction. 

Orgsnizatioos strong in instructional technology using interactive, networked computers preseni;^ 
are deficient in measurement expertise; however, it is becoming a competitive advantage to have strotig « 
measurement and assessment components and to integrate these wdl with faistruction. Therefore, these 
orp mization s will find ways around the lack of measurement expertise. People can be hired, but few havt) 
the fflm Wn a ti oo of measurement, instructional, and interaethe computer know-how. Good people can be 
devekiped over time. Testing companies msy help provide the measurement expertise if tiwy are wiffing 
to work with the instructional technology companies as a new kind of client 

It is the dients of ttie testing companies who must provide tiie leodership in Computerized 
Educational Assessment if it is to come from the testing hifrastructure at hi Someof the okl clients, in 
particular state education agencies, membership organizations made up of schools and colleges, and 

professional societies are weU positioned to be leaders hi computerized assessment and even assessme^ 
integrated with learning and mstruction. WheUxer or not these ejdsting users of the services of testing 
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compaiiies step fonrard aa leaden or not, a variety of new clients are emerging. 

Testing compames and the measurement profession that stands behind them have been leaders in 
dev«leping and promoting professiofial standards for test use. llie new technoloQr companies, mduding 
ILS companies , software companiw, and hardware con^Mnies are generaQjr not fiunHiar with these 
standards, and may not be qrmpathetic with them. It will be a challenge for poBcy to see that high 
standards are maintained and that the fvofit motive will not overwhehn attention to standards. 
Maintaining high standards is not without its costs. 

7) TheevohitioiioCtfaftnewliifinstnKtGre. 

At the end of section n coosideraUe attention was given to the current testing infrastructure and 
the needed new one. This new mfrastnicture is distributed to schools and colleges as hardware installed 
indassroomsandlabe. This egu^anent win be uagletw tmlwiMi tlw f^nnw^ «?wrwHlitiftB nn developed to use 
it It behooves the developers of new educational technologies to attend to the traditions, patterns, and 

hahhs of operatioQ of schools and teachers if they want to develop a market that more easify appreciates 
and understands their products. 

The coinpanies hanre developed their products around the concept of individualized mstruc 
•learning labs vi^iere the studratsworicmdependentltjr. Individualised instruction is alien m the main to 
sdiods as they now operate. ETtending ILAS cracepts into classrooms throu{^ groiq>-paced, presenter- 
controlled activities offers great promise for the fiiture. The technologies of mteractive computers for 
diqriagr on prqjectors or large monitors, and the technologies of p(niable individual response units seem 
oatoral^ suited for the introduction of UiAS concepts mto classrooms. The goal of better learning and 
instruction through faitegrated, unobtrusive, and helpftd assessment into classrooms is wc rth the effort it 
wiUtske. 

THE ARGUMENT OF THIS PAPER IN A NDTSHELL 

A shift has occurred in the purposes of testing. Selection and judging are not the central problems 
for educational measurement any mote. Inqwoving achievement and progress for a demographical^ 
diverse student population, to standards achieved by on^ a few now, has become the central problem. 

Measurement relates to this need to enhance learnhig progress in a fimdamentalwqy. Achievement 
constructs are needed, and associated standards of excellence that go beyond the simpUstic minhnum 
competendea that are current in so-called reform efforts. 

Measurement and assessment are fimdamental to all occupations and professions because decision 
making is fimdamental to all, and decision-making depends on sensitive judgments based on appropriate 
and accurate faiformation. 

Inqprovements in educational measurement practice continue to be vital because it is essential to 
know what to measure and how to measure each hnportant construct. A general^ acceptable set of 
achievenient constructs witii associated measurement metiwdseffectivefor each type of achievement n^ 
to be developed. Table 8 is a simple model for this concept Good measurement practice also requires a 
way to manage the measurements and other data. Mainframe Computers have been used in this task m 
thepest Decentralized computers fai classrooms and labs win be used in the fiiture. Good measurement 
practice requires in addition a wny to keep tiie measures up to date. Systems that perform all of these 
fimctiona are present^ called Integrated Learning Systems (ILS), but tii^ are weak on m^^ 
so on management, and getting stironger on instructioa They provide the framework for introducing and 

using good measures, and when Uiey do, they couU more appropriately be caned Integrated L^^ 
Assessment Systems (ILAS). 

Good assessinent relates to the nation's need to enhance learning progress in a fimdamental wi^. 
The high-level otjectives tiiat go beyond scaffolding knowledge currently require sensitive mterpretation 
of student performances and products, and of tiie processes utvohred in each. Those who perform the 
assessments are the teachers and the more advanced students. No outside agency, human or machine, 
candoit These people require a clear conception of tiie standards of excellence. Teachers need to keep 
miproving tiieir ability to relate student performances and products to these standards. Th^ also ne,id 
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to keep improving their ability to provide instruction, hints, and helps that will enable all students with 
the desire to learn to achieve. Assessment also invoWes wise derjision*making; ^i^ iri ^n g dBcisjons about 
how to guide students into paths that will promote learning progres s . In short, good assessment practice 
involves both knowing how to interpret data in context with other information and knowing how to use 
the interpretations in dqy-to-dqy educational decision making. 

In contrast to measurement as practiced in some other occupations, educational measurement plays 
a peculiar and skewed role. There is a lack of consensus on vriiat should be measured, a lack of 
measurement instrumentatkm (except not fti% i^>prqpriate printed instruments), and a luck of an 
infrastnKture for obtaining the i^qjropriate measures and keeping them tq>>to-date. Educational 
measuTf ;m«aits are akewed toward measurement for High-Stakes dedaionB imposed on the teachers and/or 
the learners from outside. 

Ckimputer aystems can integrate administration of measurement instruments, presentati<m of 
instructionai materials, record*keq>ing; and management oS records and of instructional activities. 
Computers interact, so they can provide optional advice and help from moment to mffmgnt, not just at 
formal workshop inservke sessions. This can provide a new kind of growth ndronment for classrooms 
and learning labs. The growth environment is important for teachers as professional instructors and 
assessors, and for students as ftiture contributors and assessors. The introduction of appropriatelty 
configured con^mter ayutems is the most promising way now visible for providing this new environment 
for productive leamini^ ^mmt^hVng^ and growth for all people in the system. 

Top-down impleme n t ation models that do not involve the teachers, other educators, and students in 
a process of slow growth wiUprobobfy not work. Even if the perfect Integrated Learning and Assessment 
System were ftilty developed today, it would be rejected by users unless they were able to develop their 
own rok». Therefore, it is better to start where the users are now, and introduce formative evaluation 
methods, enlisting the aid of the users to improve the systems over subsequent generations based on 
formative evahution data. Formative evaluation then, is more than a better method of program 
evaluation; it is a fundamental ospwi of implementation strategy. 

BBOOMMENDAnONS 

Beoommcndatkns Dealmg With the Purposes and MMiods for Teatiii^ in the 



BeoimmendBtknQDe:Great]|yincKaaetliefrequeu7andvarw^ 
■kaksa aaseasmBnts^ but faakmce the two. 



1.1 Through research and devetopment funding and policy support, encourage the devdopment of 
high-stakes measures that can be integrated and correlated intimate^ with the curricula at a 
few k^ m il eston ffi and that measure more integrated and con^rebensive achievement otjecthres 
than the sraflhlding objectives now common. 

1.2 Reduce the use of item tests and other scaffoMing level tests for incremental grading practices 
and shift the burden, both of gradmg students and hoking teadiers accountaUe to these more 
careAiQy devetoped, more integrated, and less narrow^ construed high-stakes measures. 

1.3 Support researdi,devek)pnient, and implementation ofhel^flystems that integrate meu 
with instructkm. 

1.4 Provkie trakiing for both teachers and students in holistic assessment of standardized 
performance tasks and student exhibits and of the intermediate products and processes leading 
IQ) to these final products. 

BeoonmaidBtifla Twoc Tnrrna se tbe frequency of finiMthfe evahMtkn, and provide fbu^ and 
ineeuUf ea to use the eraluBtkn data for on-going i mp fowni M ini ii rf mAuimAwt^ IMwg rju n ^ 

2.1 Encourage research and devetopment on direct meamiMMi of nywtAtn iifii{«>^ffn nnd prfo pwinoA^ 
as wen as measures of student achievement, also on methods for summarizing these measures, 
with attention to student privacy issues, for the use of developers m revising and hnproving 
curricula and assessment materials. 

2.2 Make summative and formative evaluation continuing processes rather thvin spedal prqjects, 
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while utOising the carefiil^ developed hi^-stakes measures proposed in Recommendation 1.1 
for continuing summative dedsion-making. 
2.3 Since sunmiative d eri si o na are of the go/no-go variety, and do not contribute to improvement, 
administrators should be encouraged to provide resources to improve based on formative 
eval u a t ion. 

With the ahrinUng teacher force prqjected over the next decade, increased use of technology 
may be the onltjr cost^ffecthre way to inqtleuisnt new learning-oriented i4>proache8 to education. 
But new qfstflms must be installed and improved over time as the teaching force learns new 
roles and new technical skins. 

BeoimuaendatioaThree: iDoreoae the Use of Alftenate Methods 



3,1 Siq;)port research, development, and implementation which places more nm fAtima on 
measurement metho ds of performance tasks, exhibits, and process measures during tool use, and 
thus laon teaching and learning of the hi|^-order constructs of integration, creative 
production and strategies. 

U^j locreaM the variety ofitem types for the item tests that measure scaffol^^ This 
kind of knowledge can convenient^ be measured by verbal multiple choice or short-answer 
items, but there are maiqr new item types that can be delivered on both fopet and pencil and 
conqniter that should be ftnther developed and used. 

3.3 Encourage the development and use of standardized performance tasks, such as simulations of 
conqdez and realistic situations, games, and i'od-like laboratoiy environments that allow students 
to make choices and decisions and observe the realistic consequences. Utilize these 
conqmterized performance tasks to assess integrated performance oigectives. 

3.4 Support research and development leading to automated scoring schemes lor computerized 
performance tasks, but in the mean t ime encourage the use of hoHatic scoring techniques on the 
part of both students and teachers. 

3.6 Support and encourage the development of methods for using student exhibits as a part of 
assessment integrated with instruction. These methods include portfolio management 
procedures and require that students and teachers be taught holistic scoring methods for essi^ 
performances, student-generated aq)eriments, etc. Use these measurement methods as part 
of assessment of creative production otjectives. 

3.6 Encourage the development of measurement methods that assess ^ 

students develop by hdistic methods. This recommendation applies both in^nmL^ tlUS^ 
and exhibits, and emphasizes process measures. Use these methods to assess strategy 
inqnrovement otgectives. 

3.7 Support the development of automated computer scoring of process measures during student 
use ofcomputer tools like word processors, spreadsheets, or presentation packages. Investigate 
recording student responses while they are using computer tools, and develop software to 
provkle hints and helps on strategy at the moment of need. 

Beoommendatiaa Four Foster new item types and uaea of portabfe answer uMidiA in order to utiliw 
the cnrreot testing infinstmcture mote oeativellf. 

4.1 Pr /ide and encourage R&D fv Hng and seek to instaU policy-baaed incentives for organizations 
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that perform testing to encourage them to introduce new item types beyond mult^le choice. 
New item types should oflfer new methods for obtaining student responses on answer sheets, 
using his^ resohition scanners and other technologies* 

4J2 Encourage the development of assessments mtegrated with instruction that utilize the new 
answer sheet item types and use them for practice and feedback in connection both with group 
presentations and wHh individual seatwork. 

RwwininmnfaiioD Fhrec Epcoumge the dg v doiM a snt of the new, tocBBied ipfraatrueture of Integpated 

LsamiiigaiidA B B MBuieut aiyatemB^an^ 

andlbrB&D. 

5.1 Encourage the ftirther development and implementation of cooo^uter-based approaches to 
integrating learning and assessment Encomage many approaches rather than one kind of 
concept and product 

6^ Encourage the evolution of what might be called an ILAS industiy throu|^ fbndfaig R&D 

integrated instructional and assessment materials, and to conduct research and development. 

5.3 Stq>port research, development, and implementation for group-oriented aystems that integrate 
instruction and assenment Needed tedmology includes projectors and software for teachers 
to present M^Qent instructional matftrials with integrated groiq>-paced assessments. The 
display capab ili t i es should include color, graphics, video, and audio. The student response entiy 
technology that needs to be developed includes response pads, infira-red linkages, and student- 
oriented portable conq)uters. 

Tfawwrnimdatjon Snc aicouwge the pmfpiw i nml deietopoiept of teachers and other pgofeaaopah 
who are knowiedgwiMft and skillfdabmit both tte edmicaia^ectaofGEA, 
and are skilled at ]ntegrBtil^E asseaBment with inta^^ 

6.1 State agencies, school districts, and professional associations should encourage conferences, 
publications, and program development activity to effect this recommendation* 

6.2 Provide incentives for colleges of education to introduce new programs for the de^ 
professionals yffbo can provide skilled holistic assessment and who can integrate assessment with 



6.3 Support reficarch and development which will lead to "built*m" computerized consultants and 
advisors iaio integrated learning and aasessment aystems to provide a coatinuous professional 
growth program for teachers are users of the systems. The consultation and advice wOl 
occur at the moment of need during the school day, not limited to summers, released time or 
weekend workshops 

PoiBiy BacwnniRndatifwiss 

Wiww uiwi M laii on Sevi^ Federal and state poligr should both prcmde BAD funte and stimulate 
private aector investment in imptwiug tfrhnohtgy^baaed a ss Msumui praetipcaL 

Effecthre policy that focuses on stimulating R&D and innovative product development is 
recommended to t^idate enhanced hi^ stakes assessment options, also to stimulate the creation of 
the high quaUtyhc^ systems proposed in this paper. These types of measurement systems can no 
longer be left up to the teachers to develop in their spare time. The types of research and 
development are detailed hi Becommendations One through Six. 



instruction. 
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Hfrwmmmifat fatt E|g|tf: EnoouneB the rnminhm»ni>m and imp t w oa e ntof Irigh .nafnnMiil teitiiv 

Coiuputeriied testing requires fiuthfiil adherence to the estaUiahed profesabnal standards for 
test construction and whiatinn^ gtandrnda fnr teat unA atunHnftia fnt» tuhninimH> u*rir» Ymrf^^vrt 
llMse standards have been codified in the following professionai references: Standards for 
Educational and Psychological Testhig; Guidelines f y Con^uter-Based Tests and InterpretationB, 
Code of Fair Testing Practices in Education, and Technical Guidelines for Assessing Computerized 
Adaptive Tests. 

Professiooal assessment standards require careful focus on technical issues of test validity, test 
reli a b ili t y and errors of measurement, «^K"g; nflrming and cggnting of fflwnputfflised tests as well 

as issues related to con^wterixed test adminiBtration, scoring and reporting and protecting th^ 
oftesttakers. The purpose of the standards is to provide detailed criteria for the evahiatioa of tests, 
testing practices and effects of test usa. Standards help to ensure thnt test developers and 
admhiistrators focus on issues of validity, comparability, equity, ethical issues, bias, and confidentiality 
of scores. 

The advmt of Learning Progreas hrip ayateana may i^iiw* mnttitiiit*'^ ftf thi*iif> iftnndardfl. For 
example, reliability is vital in a high-stakes admissjons test, and must be bought at a price ~ more 
items and more testing time. Admissions is a high-stakes decision, but whether to take a i**Fnmg 
module or not is a tow-stakes decision that can easify be corrected if a decision didn't work. Itisnot 
worth the time and cost of the high-stakes standard for reliability. Qy contrast, construct vaMty - 
what ia really being measured and learned - is of utmost faiqwrtance in both kmds of systems. 

8.1 Policy should nviintiiin a contimiing emphaws on Equity in evaluating CEA systems. Themtent 

is to provide equattty of educational opportunity for disadvantaged groiqw aa weU as advantaged 
groiqm. If the more advantaged'districts and schools purchase and inq>lement computerized 
testing technology, whOe the diudvantaged districts and schools do not, then there will be an 
inherent inequality of educational opportunity. When economically advantaged students have 
greater access to craqmterized technology in their homes than economicaQy disadvantaged 
children, then there will be some level of inequality of educational opportunity. 

8w2 Policy should emph a s ii e fairness issues in the devetopment and maplementation of CEA aystems. 
The code of fair testing practices in education was devek)ped to safeguard the rights of test 

t akers. W ltii computerized tests, the intent of the fair testing practices is to provide a fair ^ 
ai^jropriate test for each examinee whether the test is administered by computer or by paper 
and pendL This translates into support for studies of item or test performance differences for 
a partkular kind of test for members of age, ethnic, cultural or gender groups hi the population 
oftesttakers. Such research should be derigned to detect and eliminate aspects of test design, 
content, or format that mi(^t bias test scores for particular groups. 

8.3 Support studies that assure statisticaQy that two score scales are equivalent (equating studies) 
in order to establish the degree of comparability of scores from computerized tests and paper- 
administered tests when both forms are administered to the same population. 

IN CONCLUSION 



America's needs are great, but American ingenuity has been at work over-time to come up with 
technnlngiral tools and ideas for usfaig them in education. The tools and ideas cut across mai^ disdplmes, 
but there are other ideas, embodied in systems that integrate management, instruction, and iBMi^^t 
into coherent and usable qrstems. Systems exist now for equipping learning and testing rooms as 
permanent centers. Systems also exist and new ones are predicted, that reach out mtotiie classrooms and 
provide support for presentation, assessm e nt, records management, and practice. 

There is much work to be done in the areas of science, technotogy, infrastructure building; and 
support as educators' roles evohre. America has met challenges before, and can and will meet this one. 



SO 



